WO 2004/046382 



PCT/GB2003/005102 



Product <=md Method 

The present invention relates to oligonucleotide 
probes, for use in assessing gene transcript levels in a . 
cell, which may be used in analytical techniques, 
particularly diagnostic techniques. Conveniently the 
probes are provided in kit form. Different sets of 
probes may be used in techniques to prepare gene 
expression patterns and. identify, diagnose or monitor 
different states, such as diseases, conditions or stages 
thereof. Also provided are methods of identifying 
suitable probes and their use in methods of the 
invention. 

The identification of quick and easy methods of 
sample analysis for, for example, diagnostic 
applications, remains the goal of many researchers. End 
users seek methods which are cost effective, produce 
statistically significant results and which may be 
implemented routinely without the need for highly 
skilled individuals. 

The analysis of gene expression within cells has 
been used to provide information on the state of those 
cells and importantly the state of the individual from 
which the cells are derived. The relative expression of 
various genes in a cell has been identified as 
reflecting a particular state within a body. For 
example, cancer cells are known to exhibit altered 
expression of various proteins and the transcripts or 
the expressed proteins may therefore be used as markers 
of that disease state. 

Thus biopsy tissue may be analysed for the presence 
of these markers and cells originating from the site of 
the disease may be identified in other tissues or fluids 
of the body by the presence of the markers. 
Furthermore, products of the altered expression may be 
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released into the blood stream and these products may be 
analysed. In addition cells which have contacted 
disease cells may be affected by their direct contact 
with those cells resulting in altered gene expression 
5 and their expression or products of expression may be 
similarly analysed. 

However, there are some . limitations with these 
methods. For example, the use of specific tumour 
markers for identifying cancer suffers from a variety of 
10 defects, such as lack of specificity or sensitivity, 
association of the marker with disease states besides 
the specific type of cancer, and difficulty of detection 
in asymptomatic individuals. 

In addition to the analysis of one or two marker 
15 transcripts or proteins, more recently, gene expression 
patterns have been analysed. Most of the work involving 
large-scale gene expression analysis with implications 
in disease diagnosis has involved clinical samples 
originating from diseased tissues or cells. For 
20 example, several recent publications, which demonstrate 
that gene expression data can be used to distinguish 
between similar cancer types, have used clinical samples 
from diseased tissues or cells (Alon et al . 1999, PNAS, 
96, p6745-6750; Golub et al . 1999, Science, 286, p531- 
25 537; Alizadeh et al, 2000, Nature, 403, p503-511; 
Bittner et al., 2000, Nature, 406, p536-540) . 

However, these methods have relied on analysis of a 
sample containing diseased cells or products of those 
cells or cells which have been contacted by disease 
3 0 cells. Analysis of such samples relies on knowledge of 
the presence of a disease and its location, which may be 
difficult in asymptomatic patients. Furthermore, 
samples can not always be taken from the disease site, 
e.g. in diseases of the brain. 
35 In a finding of great significance, the present 

inventors identified the previously untapped potential 
of all cells within a body to provide information • 
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10 



relating to the state of the organism from which the 
cells were derived. W098/49342 describes the analysis 
of the gene expression of cells distant from the site of 
disease, e.g. peripheral blood collected distant from a 
cancer site. 

This finding is based on the premise that the 
different parts of an organism's body exist in dynamic 
interaction with each other. When a disease affects one 
part of the body, other parts of the body are also 
affected. The interaction results from a wide spectrum 
of biochemical signals that are released from the 
diseased area, affecting other areas in the body. 
Although, the nature of the biochemical and 
physiological changes induced by the released signals 
15 can vary in the different body parts, the changes can be 
measured at the level of gene expression and used for 
diagnostic purposes. 

The physiological state of a cell in an organism is 
determined by the pattern with which genes are expressed 
in it. The pattern depends upon the internal and 
external biological stimuli to which said cell is 
exposed, and any change either in the extent or in the 
nature of these stimuli can lead to a change in the 
pattern with which the different genes are expressed in 
the cell. There is a growing understanding that by 
analysing the systemic changes in gene expression 
patterns in cells in biological samples, it is possible 
to provide information on the type and nature of the 
biological stimuli that are acting on them. Thus, for 
30 example, by monitoring the expression of a large number 
of genes in cells in a test sample, it is possible to 
determine whether their genes are expressed with a 
pattern characteristic for a particular disease, 
condition or stage thereof. Measuring changes in gene 
activities in cells, e.g. from tissue or body fluids is 
therefore emerging as a powerful tool for disease ■ 
diagnosis. 
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Such methods have various advantages. Often, 
obtaining clinical samples from certain areas in the 
body that is diseased can be difficult and may involve 
undesirable invasions in the body, for example biopsy is 
5 often used to obtain samples for cancer. In some cases, 
such as in Alzheimer's disease the diseased brain 
specimen can only be obtained post-mortem. Furthermore, 
the tissue specimens which are obtained are often 
heterogeneous and may contain a mixture of both diseased 

10 and non-diseased cells, making the analysis of generated 
gene expression data both complex and difficult. 

It has been suggested that a pool of tumour tissues 
that appear to be pathogenetically homogeneous with 
respect to morphological appearances of the tumour may 

15 well be highly heterogeneous at the molecular level 
(Alizadeh, 2000, supra) , and in fact might contain 
tumours representing essentially different diseases 
(Alizadeh, 2000, supra; Golub, 1999, supra) . For the 
purpose of identifying a disease, condition, or a stage 

20 thereof, any method that does not require clinical 

samples to originate directly from diseased tissues or 
cells is highly desirable since clinical samples 
representing a homogeneous mixture of cell types can be 
obtained from an easily accessible region in the body. 

25 We have now identified a set of probes of 

surprising utility for identifying one or more diseases . 
Thus, we now describe probes and sets of probes derived 
from cells which are not disease cells and which have 
not contacted disease cells, which correspond to genes 

30 which exhibit altered expression in normal versus 

disease individuals, for use in methods of identifying, 
diagnosing or monitoring certain conditions, 
particularly diseases or stages thereof . 

Thus the invention provides a set of 

35 oligonucleotide probes which correspond to genes in a 
cell whose expression is affected in a pattern 
characteristic of a particular disease, condition or 
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stage thereof , wherein said genes are systemically 
affected by said disease, condition or stage thereof. 
Preferably said genes are metabolic or house-keeping 
genes and preferably are constitutively moderately or 
5 highly expressed. Preferably the genes are moderately 
or highly expressed in the cells of the sample but not 
in cells from disease cells or in cells having contacted 
such disease cells. 

Such probes, particularly when isolated from cells 

10 distant to the site of disease, do not rely on the 

development of disease to clinically recognizable levels 
and allow detection of a disease or condition or stage 
thereof very early after the onset of said disease or 
condition, even years before other subjective or 

15 objective symptoms appear. 

As used herein "systemically" affected genes refers 
to genes whose expression is affected in the body 
without direct contact with a disease cell or disease 
site and the cells under investigation are not disease 

20 cells. 

"Contact" as referred to herein refers to cells 
coming into close proximity with one another such that 
the direct effect of one cell on the other may be 
observed, e.g. an immune response, wherein these 

25 responses are not mediated by secondary molecules 

released from the first cell over a large distance to 
affect the second cell . Preferably contact refers to 
physical contact, or contact that is as close as is. 
sterically possible, conveniently, cells which contact 

30 one another are found in the same unit volume, for 
example within 1cm 3 . 

A "disease cell" is a cell manifesting phenotypic 
changes and is present at the disease site at some time 
during its life-span, e.g. a tumour cell at the tumour 

35 site or which has disseminated from the tumour, or a 
brain cell in the case of brain disorders such as 
Alzheimer 1 s disease. 
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"Metabolic" or "house -keeping" genes refer to those 
genes responsible for expressing products involved in 
cell division and maintenance, e.g. non- immune function 
related genes . 

5 "Moderately or highly" expressed genes refers to 

those present in resting cells in a copy number of more 
than 30-100 copies/cell (assuming an average 3xl0 5 mRNA 
molecules in a cell) . 

Specific probes having the above described 
10 properties are provided herein. 

Thus in one aspect, the present invention provides 
a set of oligonucleotide probes, wherein said set 
comprises at least 10 oligonucleotides selected from: 
an oligonucleotide as described in Table 1 or 
15 derived from a sequence described in Table 1, or an 

oligonucleotide with a complementary sequence, 
or a functionally equivalent oligonucleotide. 
"Table 1" as referred to herein refers to Table la 
and/or Table lb. Table lb contains reference to 
20 additional clones and sequences as disclosed herein. 
Similarly Tables 2 and 4 comprise 2 parts, a and b. 

The invention also provides one or more 
oligonucleotide probes, wherein each oligonucleotide 
probe is selected from the oligonucleotides listed in 
25 Table 1, or derived from a sequence described in Table 

1 , or a complementary sequence thereof . The use of such 
probes in products and methods of the invention, form 
further aspects of the invention. 

As referred to herein an "oligonucleotide" is a 
30 nucleic acid molecule having at least 6 monomers in the 
polymeric structure, ie. nucleotides or modified forms 
thereof. The nucleic acid molecule may be DNA, RNA or 
PNA (peptide nucleic acid) or hybrids thereof or 
modified versions thereof, e.g. chemically modified 
35 forms, e.g. LNA (Locked Nucleic acid), by methylation or 
made up of modified or non-natural bases during 
synthesis, providing they retain their ability to bind 
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to complementary sequences. Such oligonucleotides are 
used in accordance with the invention to probe target 
sequences and are thus referred to herein also as 
oligonucleotide probes or simply as probes. 
5 An "oligonucleotide derived from a sequence 

described in Table 1" (or any other table) refers to a 
part of a sequence disclosed in that Table (e.g. Table 
1-4) , which satisfies the requirements of the 
oligonucleotide probes as described herein, e.g. in 

10 length and function. Preferably said parts have the size 
described hereinafter. 

Preferably the oligonucleotide probes forming said 
set are at least 15 bases in length to allow binding of 
target molecules . Especially preferably said 

15 oligonucleotide probes are from 20 to 200 bases in 

length, e.g. from 30 to 150 bases, preferably 50-100 
bases in length. 

As referred to herein the term "complementary 
sequences" refers to sequences with consecutive 

20 complementary bases (ie. T:A, G:C) and which 

complementary sequences are therefore able to bind to 
one another through their complementarity. 

Reference to "10 oligonucleotides" refers to 10 
different oligonucleotides. Whilst a Table 1 

25 oligonucleotide, a Table 1 derived oligonucleotide and 
their functional equivalent are considered different 
oligonucleotides, complementary oligonucleotides are not 
considered different. Preferably however, the at least 
10 oligonucleotides are 10 different Table 1 

3 0 oligonucleotides (or Table 1 derived oligonucleotides or 
their functional equivalents) . Thus said 10 different 
oligonucleotides are preferably able to bind to 10 
different transcripts. 

Preferably said oligonucleotides are as described 

35 in Table 1 or are derived from a sequence described in 
Table 1. Especially preferably said oligonucleotides 
are" as described in Table 2 or Table 4 or are derived 
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from a sequence described in either of those tables. 
Especially preferably the oligonucleotide (or the 
oligonucleotide derived therefrom) has a high occurrence 
as defined in Table 3, especially preferably >40%, e.g. 
5 >80 or >90, e.g. 100%. 

A "set" as described refers to a collection of 
unique oligonucleotide probes (ie. having a distinct 
sequence) and preferably consists of less than 1000 
oligonucleotide probes, especially less than 500 probes, 

10 e.g. preferably from 10 to 500, e.g. 10 to 100, 200 or 
300, especially preferably 20 to 100, e.g. 30 to 100 
probes. In some cases less than 10 probes may be used, 
e.g. from 2 to 9 probes, e.g. 5 to 9 probes. 

It will be appreciated that increasing the number 

15 of probes will prevent the possibility of poor analysis, 
e.g. misdiagnosis by comparison to other diseases which 
could similarly alter the expression of the particular 
genes in question. Other oligonucleotide probes not 
described herein may also be present, particularly if 

20 they aid the ultimate use of the set of oligonucleotide 
probes. However, preferably said set consists only of 
said Table 1 oligonucleotides, Table 1 derived 
oligonucleotides, complementary sequences or 
functionally equivalent oligonucleotides, or a sub-set 

25 thereof (e.g. of the size as described above) , 

preferably a sub- set for which sequences are provided 
herein (see Table 1 and its footnote) . Especially 
preferably said set consists only of said Table 1 
oligonucleotides, Table 1 derived oligonucleotides, or 

30 complementary sequendes thereof, or a sub-set thereof. 

Multiple copies of each unique oligonucleotide 
probe, e.g. 10 or more copies, may be present in each 
set, but constitute only a single probe. 

A set of oligonucleotide probes, which may 

35 preferably be immobilized on a solid support or have 

means for such immobilization, comprises the at least 10 
oligonucleotide probes selected from those described 



WO 2004/046382 



PCT/GB2003/005102 



- 9 - 

hereinbefore . Especially preferably said probes are 
selected from those having high occurrence as described 
in Table 3 and as mentioned above. As mentioned above, 
these 10 probes must be unique and have different 
5 sequences- Having said this however, two separate 

probes may be used which recognize the same gene but 
reflect different splicing events. However 
oligonucleotide probes which are complementary to, and 
bind to distinct genes are preferred. 

10 As described herein a "functionally equivalent" 

oligonucleotide to those described in Table 1 or derived 
therefrom refers to an oligonucleotide which is capable 
of identifying the same gene as an oligonucleotide of 
Table 1 or derived therefrom, ie. it can bind to the 

15 same mRNA molecule (or DNA) transcribed from a gene 
.(target nucleic acid molecule) as the Table 1 
oligonucleotide or the Table 1 derived oligonucleotide 
(or its complementary sequence) . Preferably said 
functionally equivalent oligonucleotide is capable of 

20 recognizing, ie. binding to the same splicing product as 
a Table 1 oligonucleotide or a Table 1 derived 
oligonucleotide. Preferably said mRNA molecule is the 
full length mRNA molecule which corresponds to the Table 
1 oligonucleotide or the Table 1 derived 

25 oligonucleotide . 

As referred to herein "capable of binding" or 
"binding" refers to the ability to hybridize under 
conditions described hereinafter. 

Alternatively expressed, functionally equivalent 

3 0 oligonucleotides (or complementary sequences) have 
sequence identity or will hybridize, as described 
hereinafter, to a region of the target molecule to which 
molecule a Table 1 oligonucleotide or a Table 1 derived 
oligonucleotide or a complementary oligonucleotide 

35 binds. Preferably, functionally equivalent 

oligonucleotides (or their complementary sequences) 
hybridize to one of the mRNA sequences which corresponds 
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to a Table 1 oligonucleotide or a Table 1 derived 
oligonucleotide under the conditions described 
hereinafter or has sequence identity to a part of one of 
the mRNA sequences which corresponds to a Table 1 
5 oligonucleotide or a Table 1 derived oligonucleotide. A 
"part" in this context refers to a stretch of at least 
5, e.g. at least 10 or 20 bases, such as from 5 to 100, 
e.g. 10 to 50 or 15 to 30 bases. 

In a particularly preferred aspect, the 

10 functionally equivalent oligonucleotide binds to all or 
a part of the region of a target nucleic acid molecule 
(mRNA or cDNA) to which the Table 1 oligonucleotide or 
Table 1 derived oligonucleotide binds. A "target" 
nucleic acid molecule is the gene transcript or related 

15 product e.g. mRNA, or cDNA, or amplified product 

thereof. Said "region" of said target molecule to which 
said Table 1 oligonucleotide or Table 1 derived 
oligonucleotide binds is the stretch over which 
complementarity exists. At its largest this region is 

20 the whole length of the Table 1 oligonucleotide or Table 
1 derived oligonucleotide, but may be shorter if the 
entire Table 1 sequence or Table 1 derived 
oligonucleotide is not complementary to a region of the 
target sequence . 

25 Preferably said part of said region of said target 

molecule is a stretch of at least 5, e.g. at least 10 or 
20 bases, such as from 5 to 100, e.g. 10 to 50 or 15 to 
3 0 bases. This may for example be achieved by said, 
functionally equivalent oligonucleotide having several 

3 0 identical bases to the bases of the Table 1 

oligonucleotide or the Table 1 derived oligonucleotide . 
These bases may be identical over consecutive stretches, 
e.g. in a part of the functionally equivalent 
oligonucleotide, or may be present non- consecutively, 

35 but provide sufficient complementarity to allow binding 
to the target sequence. 

Thus in a preferred feature, said functionally 
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equivalent oligonucleotide hybridizes under conditions 
of high stringency to a Table 1 oligonucleotide or a 
Table 1 derived oligonucleotide or the complementary 
sequence thereof. Alternatively expressed, said 
5 functionally equivalent oligonucleotide exhibits high 
sequence identity to all or part of a Table 1 
oligonucleotide. Preferably said functionally 
equivalent oligonucleotide has at least 70% sequence 
identity, preferably at least 80%, e.g. at least 90, 95, 

10 98 or 99%, to all of a Table 1 oligonucleotide or a part 
thereof. As used in this context, a "part" refers to a 
stretch of at least 5, e.g. at least 10 or 20 bases, 
such as from 5 to 100, e.g. 10 to 50 or 15 to 3 0 bases, 
in said Table 1 oligonucleotide. Especially preferably 

15 when sequence identity to only a part of said Table 1 
oligonucleotide is present, the sequence identity is 
high, e.g. at least 80% as described above. 

Functionally equivalent oligonucleotides which 
satisfy the above stated functional requirements include 

20 those which are derived from the Table 1 

oligonucleotides and also those which have been modified 
by single or multiple nucleotide base (or equivalent) 
substitution, addition and/or deletion, but which 
nonetheless retain functional activity, e.g. bind to the 

25 same target molecule as the Table 1 oligonucleotide or 
the Table 1 derived oligonucleotide from which they are 
further derived or modified. Preferably said 
modification is of from 1 to 50, e.g. from 10 to 30, 
preferably from 1 to 5 bases . Especially preferably 

30 only minor modifications are present, e.g. variations in 
less than 10 bases, e.g. less than 5 base changes. 

Within the meaning of "addition" equivalents are 
included oligonucleotides containing additional 
sequences which are complementary to the consecutive 

35 stretch of bases on the target molecule to which the 
Table 1 oligonucleotide or the Table 1 derived 
oligonucleotide binds. Alternatively the addition may 
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comprise a different, unrelated sequence, which may for 
example confer a further property, e.g. to provide a 
means for immobilization such as a linker to bind the 
oligonucleotide probe to a solid support. 
5 Particularly preferred are naturally occurring 

equivalents such as biological variants, e.g. allelic, 
geographical or allotypic variants, e.g. 
oligonucleotides which correspond to a genetic variant, 
for example as present in a different species. 

10 Functional equivalents include oligonucleotides 

with modified bases, e.g. using non-naturally occurring 
bases. Such derivatives may be prepared during 
synthesis or by post production modification. 

"Hybridizing" sequences which bind under conditions 

15 of low stringency are those which bind under non- 
stringent conditions (for example, 6x SSC/50% formamide 
at room temperature) and remain bound when washed under 
conditions of low stringency (2 X SSC, room temperature, 
more preferably 2 X SSC, 42°C) . Hybridizing under high 

20 stringency refers to the above conditions in which 

washing is performed at 2 X SSC, 65°C (where SSC = 0.15M 
NaCl, 0.015M sodium citrate, pH 7.2). 

"Sequence identity" as referred to herein refers to 
the value obtained when assessed using ClustalW 

25 (Thompson et al . , 1994, Nucl. Acids Res., 22, p4673- 
4680) with the following parameters: 
Pairwise alignment parameters - Method: accurate, 
Matrix: IUB, Gap open penalty: 15.00, Gap extension 
penalty: 6.66; 

30 Multiple alignment parameters - Matrix: IUB, Gap open 
penalty: 15.00, % identity for delay: 30, Negative 
matrix: no, Gap extension penalty: 6.66, DNA transitions 
weighting : 0.5. 

Sequence identity at a particular base is intended 

35 to include identical bases which have simply been 
derivatized. 

The invention also extends' to polypeptides encoded 
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by the tnRNA sequence to which a Table 1 oligonucleotide 
or a Table 1 derived oligonucleotide binds. The 
invention further extends to antibodies which bind to 
any of said polypeptides. 
5 As described above , conveniently said set of 

oligonucleotide probes may be immobilized on one or more 
solid supports . Single or preferably multiple copies of 
each unique probe are attached to said solid supports, 
e.g. 10 or more, e.g. at least 100 copies of each unique 

10 probe are present. 

One or more unique oligonucleotide probes may be 
associated with separate solid supports which together 
form a set of probes immobilized on multiple solid 
support, e.g. one or more unique probes may be 

15 immobilized on multiple beads, membranes, filters, 

biochips etc. which together form a set of probes, which 
together form modules of the kit described hereinafter. 
The solid support of the different modules are 
conveniently physically associated although the signals 

20 associated with each probe (generated as described 
hereinafter) must be separately determinable. 

Alternatively, the probes may be immobilized on 
discrete portions of the same solid support, e.g. each 
unique oligonucleotide probe, e.g. in multiple copies, 

25 may be immobilized to a distinct and discrete portion or 
region of a single filter or membrane, e.g. to generate 
an array. 

A combination of such techniques may also be used, 
e.g. several solid supports may be used which each 
30 immobilize several unique probes. 

The expression "solid support" shall mean any solid 
material able to bind oligonucleotides by hydrophobic, 
ionic or covalent bridges. 

"Immobilization" as used herein refers to 
35 reversible or irreversible association of the probes to 
said solid support by virtue of such binding. If 
* reversible, the prbbes remain associated with the solid 
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support for a time sufficient for methods of the 
invention to be carried out. 

Numerous solid supports suitable as immobilizing 
moieties according to the invention, are well known in 
5 the art and widely described in the literature and 

generally speaking, the solid support may be any of the 
well-known supports or matrices which are currently 
widely used or proposed for immobilization, separation 
etc, in chemical or biochemical procedures. Such 
10 materials include, but are not limited to, any synthetic 
organic polymer such as polystyrene, polyvinyl chloride, 
polyethylene; or nitrocellulose and cellulose acetate; 
or tosyl activated surfaces; or glass or nylon or any 
surface carrying a group suited for covalent coupling of 
15 nucleic acids. The immobilizing moieties may take the 
form of particles, sheets, gels, filters, membranes, 
microfibre strips, tubes or plates, fibres or 
capillaries, made for example of a polymeric material 
e.g. agarose, cellulose, alginate, teflon, latex or 
20 polystyrene or magnetic beads. Solid supports allowing 
the presentation of an array, preferably in a single 
dimension are preferred, e.g. sheets, filters, 
membranes, plates or biochips . 

Attachment of the nucleic acid molecules to the 
25 solid support may be performed directly or indirectly. 
For example if a filter is used, attachment may be 
performed by UV- induced crossl inking. Alternatively, 
attachment may be performed indirectly by the use of an 
attachment moiety carried on the oligonucleotide probes 
30 and/or solid support. Thus for example, a pair of 

affinity binding partners may be used, such as avidin, 
streptavidin or biotin, DNA or DNA binding protein (e.g. 
either the lac I repressor protein or the lac operator 
sequence to which it binds) , antibodies (which may be 
35 mono- or polyclonal) , antibody fragments or the epitopes 
or haptens of antibodies. In these cases, one partner 
of the binding pair is attached to (or is inherently 
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part of) the solid support and the other partner is 
attached to (or is inherently part of) the nucleic acid 
molecules . 

As used herein an "affinity binding pair" refers to 
5 two components which recognize and bind to one another 
specifically (ie. in preference to binding to other 
molecules) . Such binding pairs when bound together form 
a complex. 

Attachment of appropriate functional groups to the 
10 solid support may be performed by methods well known in 
the art, which include for example, attachment through 
hydroxyl, carboxyl, aldehyde or amino groups which may 
be provided by treating the solid support to provide 
suitable surface coatings. Solid supports presenting 
15 appropriate moieties for attachment of the binding 

partner may be produced by routine methods known in the 
art . 

Attachment of appropriate functional groups to the 
oligonucleotide probes of the invention may be performed 

20 by ligation or introduced during synthesis or 

amplification, for example using primers carrying an 
appropriate moiety, such as biotin or a particular 
sequence for capture. 

Conveniently, the set of probes described 

25 hereinbefore is provided in kit form. 

Thus viewed from a further aspect the present 
invention provides a kit comprising a set of 
oligonucleotide probes as described hereinbefore 
immobilized on one or more solid supports . 

3 0 Preferably, said probes are immobilized on a single 

solid support and each unique probe is attached to a 
different region of said solid support. However, when 
attached to multiple solid supports, said multiple solid 
supports form the modules which make up the kit. 

35 Especially preferably said solid support is a sheet, 
filter, membrane, plate or biochip. 

Optionally the kit may also contain information 
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relating to the signals generated by normal or diseased 
samples (as discussed in more detail hereinafter in 
relation to the use of the kits) , standardizing 
materials, e.g. mRNA or cDNA from normal and/or diseased 
5 samples for comparative purposes, labels for 

incorporation into cDNA, adapters for introducing 
nucleic acid sequences for amplification purposes, 
primers for amplification and/or appropriate enzymes, 
buffers and solutions. Optionally said kit may also 

10 contain a package insert describing how the method of 

the invention should be performed, optionally providing 
standard graphs, data or software for interpretation of 
results obtained when performing the invention. 
The use of such kits to prepare a standard 

15 diagnostic gene transcript pattern as described 

hereinafter forms a further aspect of the invention. 

The set of probes as described herein have various 
uses. Principally however they are used to assess the 
gene expression state of a test cell to provide 

20 information relating to the organism from which said 
cell is derived. Thus the probes are useful in 
diagnosing, identifying or monitoring a disease or 
condition or stage thereof in an organism. 

Thus in a further aspect the invention provides the 

25 use of a set of oligonucleotide probes or a kit as 

described hereinbefore to determine the gene expression 
pattern of a cell which pattern reflects the level of 
gene expression of genes to which said oligonucleotide 
probes bind, comprising at least the steps of: 

3 0 a) isolating mRNA from said cell, which may 

optionally be reverse transcribed to cDNA; 

b) hybridizing the mRNA or cDNA of step (a) to a 
set of oligonucleotide probes or a kit as defined 
herein; and 

35 c) assessing the amount of mRNA or cDNA hybridizing 

to each of said probes to produce said pattern. 

The mRNA and cDNA as referred to in this method, 
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and the methods hereinafter, encompass derivatives or 
copies of said molecules, e.g. copies of such molecules 
such as those produced by amplification or the 
preparation of complementary strands, but which retain 
5 the identity of the mRNA sequence, ie. would hybridize 

to the direct transcript (or its complementary sequence) 
by virtue of precise complementarity, or sequence 
identity, over at least a region of said molecule. It 
will be appreciated that complementarity will not exist 

10 over the entire region where techniques have been used 
which may truncate the transcript or introduce new 
sequences, e.g. by primer amplification. For 
convenience, said mRNA of cDNA is preferably amplified 
prior to step b) . As with the oligonucleotides 

15 described herein said molecules may be modified, e.g. by 
using non-natural bases during synthesis providing 
complementarity remains. Such molecules may also carry 
additional moieties such as signalling or immobilizing 
means . 

20 The various steps involved in the method of 

preparing such a pattern are described in more detail 
hereinafter. 

As used herein "gene expression" refers to 
transcription of a particular gene to produce a specific 

25 mRNA product (ie. a particular splicing product) . The 
level of gene expression may be determined by assessing 
the level of transcribed mRNA molecules or cDNA 
molecules reverse transcribed from the mRNA molecules or 
products derived from those molecules, e.g. by 

3 0 amplification. 

The "pattern" created by this technique refers to 
information which, for example, may be represented in 
tabular or graphical form and conveys information about 
the signal associated with two or more oligonucleotides. 

35 Preferably said pattern is expressed as an array of 

numbers relating to the expression level associated with 
each probe. 



WO 2004/046382 



PCT/GB2003/005102 



- 18 - 

Preferably, said pattern is established using the 
following linear model: 

y = Xb + f Equation 1 

wherein, X is the matrix of gene expression data and y 
5 is the response variable, b is the regression 

coefficient vector and f the estimated residual vector. 
Although many different methods can be used to establish 
the relationship provided in equation 1, especially 
preferably the partial Least Squares Regression (PLSR) 
10 method is used for establishing the relationship in 
equation 1 . 

The probes are thus used to generate a pattern 
which reflects the gene expression of a cell at the time 
of its isolation. The pattern of expression is 

15 characteristic of the circumstances under which that 
cells finds itself and depends on the influences to 
which the cell has been exposed. Thus, a characteristic 
gene transcript pattern standard or fingerprint 
(standard probe pattern) for cells from an individual 

20 with a particular disease or condition may be prepared 
and used for comparison to transcript patterns of test 
cells. This has clear applications in diagnosing, 
monitoring or identifying whether an organism is 
suffering from a particular disease, condition or stage 

25 thereof. 

The standard pattern is prepared by determining the 
extent of binding of total mRNA (or cDNA or related 
product) , from cells from a sample of one or more 
organisms with the disease or condition or stage 

30 thereof, to the probes. This reflects the level of 

transcripts which are present which correspond to each 
unique probe. The amount of nucleic acid material which 
binds to the different probes is assessed and this 
information together forms the gene transcript pattern 

35 standard of that disease or condition or stage thereof. 
Each such standard pattern is characteristic of the 
disease, condition or stage thereof. 
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In a further aspect therefore, the present 
invention provides a method of preparing a standard gene 
transcript pattern characteristic of a disease or 
condition or stage thereof in an organism comprising at 
5 least the steps of : 

a) isolating mRNA from the cells of a sample of one 
or more organisms having the disease or condition or 
stage thereof, which may optionally be reverse 
transcribed to cDNA; 
10 b) hybridizing the mRNA or cDNA of step (a) to a 

set of oligonucleotides or a kit as described 
hereinbefore specific for said disease or condition or 
stage thereof in an organism and sample thereof 
corresponding to the organism and sample thereof under 
15 investigation; and 

c) assessing the amount of mRNA or cDNA hybridizing 
to each of said probes to produce a characteristic 
pattern reflecting the level of gene expression of genes 
to which said oligonucleotides bind, in the sample with 
20 the disease, condition or stage thereof. 

For convenience, said oligonucleotides are 
preferably immobilized on one or more solid supports. 

The standard pattern for a great number of diseases 
or conditions and different stages thereof using 
25 particular probes may be accumulated in databases and be 
made available to laboratories on request. 

"Disease" samples and organisms as referred to 
herein refer to organisms (or samples from the same) 
with an underlying pathological disturbance relative to 
30 a normal organism (or sample) , in a symptomatic or 

asymptomatic organism, which may result, for example, 
from infection or an acquired or congenital genetic 
imperfection. Such organisms are known to have, or 
which exhibit, the disease or condition or stp.ge thereof 
35 under study. 

A "condition" refers to a state of the mind or body 
of an organism which has not occurred through disease, 
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e.g. the presence of an agent in the body such as a 
toxin, drug or pollutant, or pregnancy. 

"Stages" thereof refer to different stages of the 
disease or condition which may or may not exhibit 
5 particular physiological or metabolic changes, but do 
exhibit changes at the genetic level which may be 
detected as altered gene expression. It will be 
appreciated that during the course of a disease or 
condition the expression of different transcripts may 

10 . vary. Thus at different stages, altered expression may 
not be exhibited for particular transcripts compared to 
"normal" samples. However, combining information from 
several transcripts which exhibit altered expression at 
one or more stages through the course of the disease or 

15 condition can be used to provide a characteristic 

pattern which is indicative of a particular stage of the 
disease or condition. Thus for example different stages 
in cancer, e.g. pre-stage I, stage I, stage II, II or IV 
can be identified. 

20 "Normal" as used herein refers to organisms or 

samples which are used for comparative purposes. 
Preferably, these are "normal" in the sense that they do 
not exhibit any indication of, or are not believed to 
have, any disease or condition that would affect gene 

25 expression, particularly in respect of the disease for 
which they are to be used as the normal standard. 
However, it will be appreciated that different stages of 
a disease or condition may be compared and in such 
cases, the "normal" sample may correspond to the earlier 

3 0 stage of the disease or condition. 

As used herein a "sample" refers to any material 
obtained from the organism, e.g. human or non-human 
animal under investigation which contains cells and 
includes, tissues, body fluid or body waste or in the 

35 case of prokaryotic organisms, the organism itself. 
"Body fluids" include blood, saliva, spinal fluid, 
semen, lymph. "Body waste" includes urine, expectorated 
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matter (pulmonary patients), faeces etc. "Tissue 
samples" include tissue obtained by biopsy, by surgical 
interventions or by other means e.g. placenta. 
Preferably however, the samples which are examined are 
5 from areas of the body not apparently affected by the 

disease or condition. The cells in such samples are not 
disease cells, e.g. cancer cells, have not been in 
contact with such disease cells and do not originate 
from the site of the disease or condition. The "site of 

10 disease" is considered to be that area of the body which 
manifests the disease in a way which may be objectively 
determined, e.g. a tumour or area of inflammation. Thus 
for example peripheral blood may be used for the 
diagnosis of non-haematopoietic cancers/ and the blood 

15 does not require the presence of malignant or 

disseminated cells from the cancer in the blood. 
Similarly in diseases of the brain, in which no diseased 
cells are found in the blood due to the blood: brain 
barrier, peripheral blood may still be used in the 

20 methods of the invention. 

It will however be appreciated that the method of 
preparing the standard transcription pattern and other 
methods of the invention are also applicable for use on 
living parts of eukaryotic organisms such as cell lines 

25 and organ cultures and explants . 

As used herein, reference to "corresponding" sample 
etc. refers to cells preferably from the same tissue, 
body fluid or body waste, but also includes cells from 
tissue, body fluid or body waste which are sufficiently 

3 0 similar for the purposes of preparing the standard or 
test pattern. When used in reference to genes 
"corresponding" to the probes, this refers to genes 
which are related by sequence (which may be 
complementary) to the probes although the probes may 

35 reflect different splicing products of expression. 

"Assessing" as used herein refers to both 
quantitative and qualitative assessment which may be 
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determined in absolute or relative terms. 

The invention may be put into practice as follows. 
To prepare a standard transcript pattern for a 
particular disease, condition or stage thereof, sample 
5 mRNA is extracted from the cells of tissues, body fluid 
or body waste according to known techniques (see for 
example Sambrook et. al. (1989), Molecular Cloning : A 
laboratory manual, 2nd Ed., Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, N.Y.) from a 

10 diseased individual or organism. 

Owing to the difficulties in working with RNA, the 
RNA is preferably reverse transcribed at this stage to 
form first strand cDNA. Cloning of the cDNA or 
selection from, or using, a cDNA library is not however 

15 necessary in this or other methods of the invention. 
Preferably, the complementary strands of the first 
strand cDNAs are synthesized, ie. second strand cDNAs, 
but this will depend on which relative strands are 
present in the oligonucleotide probes. The RNA may 

20 however alternatively be used directly without reverse 
transcription and may be labelled if so required. 

Preferably the cDNA strands are amplified by known 
amplification techniques such as the polymerase chain 
reaction (PCR) by the use of appropriate primers. 

25 Alternatively, the cDNA strands may be cloned with a 
vector, used to transform a bacteria such as E. coli 
which may then be grown to multiply the nucleic acid 
molecules. When the sequence of the cDNAs are not . 
known, primers may be directed to regions of the nucleic 

30 acid molecules which have been introduced. Thus for 

example, adapters may be ligated to the cDNA molecules 
and primers directed to these portions for amplification 
of the cDNA molecules. Alternatively, in the case of 
eukaryotic samples, advantage may be taken. of the polyA 

35 tail and cap of the RNA to prepare appropriate primers. 

To produce the standard diagnostic gene transcript 
pattern or fingerprint for a particular disease or 
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condition or stage thereof, the above described 
oligonucleotide probes are used to probe mRNA or cDNA of 
the diseased sample to produce a signal for 
hybridization to each particular oligonucleotide probe 
5 species, ie. each unique probe. A standard control gene 
transcript pattern may also be prepared if desired using 
mRNA or cDNA from a normal sample. Thus, mRNA or cDNA 
is brought into contact with the oligonucleotide probe 
under appropriate conditions to allow hybridization. 

10 When multiple samples are probed, this may be 

performed consecutively using the same probes, e.g. on 
one or more solid supports r ie. on probe kit modules, or 
by simultaneously hybridizing to corresponding probes, 
e.g. the modules of a corresponding probe kit. 

15 To identify when hybridization occurs and obtain an 

indication of the number of trans cripts/cDNA molecules 
which become bound to the oligonucleotide probes, it is 
necessary to identify a signal produced when the 
transcripts (or related molecules) hybridize (e.g. by 

20 detection of double stranded nucleic acid molecules or 

detection of the number of molecules which become bound, 
after removing unbound molecules, e.g. by washing) . 

In order to achieve a signal, either or both 
components which hybridize (ie. the probe and the 

25 transcript) carry or form a signalling means or a part 
thereof. This "signalling means" is any moiety capable 
of direct or indirect detection by the generation or 
presence of a signal. The signal may be any detectable 
physical characteristic such as conferred by radiation 

3 0 emission, scattering or absorption properties, magnetic 
properties, or other physical properties such as charge, 
size or binding properties of existing molecules (e.g. 
labels) or molecules which may be generated (e.g. gas 
emission etc.) . Techniques are preferred which allow 

35 signal amplification, e.g. which produce multiple signal 
events from a single active binding site, e.g. by the 
catalytic action of enzymes to produce multiple 
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detectable products . 

Conveniently the signalling means may be a label 
which itself provides a detectable signal . Conveniently 
this may be achieved by the use of a radioactive or 
5 other label which may be incorporated during cDNA 
production, the preparation of complementary cDNA 
strands, during amplification of the target mRNA/cDNA or 
added directly to target nucleic acid molecules. 

Appropriate labels are those which directly or 

10 indirectly allow detection or measurement of the 

presence of the transcript s/cDNA. Such labels include 
for example radiolabels, chemical labels, for example 
chromophores or fluorophores (e.g. dyes such as 
fluorescein and rhodamine) , or reagents of high electron 

15 density such as ferritin, haemocyanin or colloidal gold. 
Alternatively, the label may be an enzyme, for example 
peroxidase or alkaline phosphatase, wherein the presence 
of the enzyme is visualized by its interaction with a 
suitable entity, for example a substrate. The label may 

20 also form part of a signalling pair wherein the other 
member of the pair is found on, or in close proximity 
to, the oligonucleotide probe to which the 
transcript/cDNA binds, for example, a fluorescent 
compound and a quench fluorescent substrate may be used. 

25 A label may also be provided on a different entity, such 
as an antibody, which recognizes a peptide moiety 
attached to the transcripts/cDNA, for example attached 
to a base used during synthesis or amplification. 

A signal may be achieved by the introduction of a 

3 0 label before, during or after the hybridization step. 

Alternatively, the presence of hybridizing transcripts 
may be identified by other physical properties, such as 
their absorbance, and in which case the signalling means 
is the complex itself. 

35 The amount of signal associated- with each 

oligonucleotide probe is then assessed. The assessment 
may be quantitative or qualitative and may be based on 
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binding of a single transcript species (or related cDNA 
or other products) to each probe, or binding of multiple 
transcript species to multiple copies of each unique 
probe. It will be appreciated that quantitative results 
5 will provide further information for the transcript 

fingerprint of the disease which is compiled. This data 
may be expressed as absolute values (in the case of 
macroarrays) or may be determined relative to a 
particular standard or reference e.g. a normal control 

10 sample. 

Furthermore it will be appreciated that the 
standard diagnostic gene pattern transcript may be 
prepared using one or more disease samples (and normal 
samples if used) to perform the hybridization step to 

15 obtain patterns not biased towards a particular 
individual's variations in gene expression. 

The use of the probes to prepare standard patterns 
and the standard diagnostic gene transcript patterns 
thus produced for the purpose of identification or 

20 diagnosis or monitoring of a particular disease or 
condition or stage thereof in a particular organism 
forms a further aspect of the invention. 

Once a standard diagnostic fingerprint or pattern 
has been determined for a particular disease or 

25 condition using the selected oligonucleotide probes , 

this information can be used to identify the presence, 
absence or extent or stage of that disease or condition 
in a different test organism or individual . 

To examine the gene expression pattern of a test 

3 0 sample, a test sample of tissue, body fluid or body 

waste containing cells, corresponding to the sample used 
for the preparation of the standard pattern, is obtained 
from a patient or the organism to be studied. A test 
gene transcript pattern is then prepared as described 

35 hereinbefore as for the standard pattern . 

In a further aspect therefore, the present 
invention provides a method of preparing a test gene 
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transcript pattern comprising at least the steps of : 
a) isolating mRNA from the cells of a sample of 
said test organism, which may optionally be reverse 
transcribed to cDNA; 
5 b) hybridizing the mRNA or cDNA of step (a) to a 

set of oligonucleotides or a kit as described 
hereinbefore specific for a disease or condition or 
stage thereof in an organism and sample thereof 
corresponding to the organism and sample thereof under 

10 investigation; and 

c) assessing the amount of mRNA or cDNA hybridizing 
to each of said probes to produce said pattern 
reflecting the level of gene expression of genes to 
which said oligonucleotides bind, in said test sample. 

15 This test pattern may then be compared to one or 

more standard patterns to assess whether the sample 
contains cells having the disease, condition or stage 
thereof . 

Thus viewed from a further aspect the present 
20 invention provides a method of diagnosing or identifying 
or monitoring a disease or condition or stage thereof in 
an organism, comprising the steps of: 

a) isolating mRNA from the cells of a sample of 
said organism, which may optionally be reverse 

25 transcribed to cDNA; 

b) hybridizing the mRNA or cDNA of step (a) to a 
set of oligonucleotides or a kit as described 
hereinbefore specific for said disease or. 
condition or stage thereof in an organism and 

30 sample thereof corresponding to the organism 

and sample thereof under investigation; 

c) assessing the amount of mRNA or cDNA 
hybridizing to each of said probes to produce 
a characteristic pattern reflecting the level 

35 of gene expression of genes to which said 

oligonucleotides bind, in said sample; and 

d) comparing said pattern to a standard 
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diagnostic pattern prepared according to the 
method of the invention using a sample from an 
organism corresponding to the organism and 
sample under investigation to determine the 
5 presence of said disease or condition or a 

stage thereof in the organism under 
investigation . 
The method up to and including step c) is the 
preparation of a test pattern as described above, 
10 As referred to herein, "diagnosis" refers to 

determination of the presence or existence of a disease 
or condition or stage thereof in an organism . 
"Monitoring" refers to establishing the extent of a 
disease or condition, particularly when an individual is 
15 known to be suffering from a disease or condition, for 
example to monitor the effects of treatment or the 
development of a disease or condition, e.g. to determine 
the suitability of a treatment or provide a prognosis. 
The presence of the disease or condition or stage 
20 thereof may be determined by determining the degree of 
correlation between the standard and test samples ' 
patterns. This necessarily takes into account the range 
of values which are obtained for normal and diseased 
samples. Although this can be established by obtaining 
25 standard deviations for several representative samples 
binding to the probes to develop the . standard, it will 
be appreciated that single samples may be sufficient to 
generate the standard pattern to identify a disease if 
the test sample exhibits close enough correlation to 
30 that standard. Conveniently, the presence, absence, or 
extent of a disease or condition or stage thereof in a 
test sample can be predicted by inserting the data 
relating to the expression level of informative probes 
in test sample into the standard diagnostic probe 
35 pattern established according to equation 1. 

Data generated using the above mentioned methods 
may be analysed using various techniques from the most 
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basic visual representation (e.g. relating to intensity) 
to more complex data manipulation to identify underlying 
patterns which reflect the interrelationship of the 
level of expression of each gene to which the various 
5 probes bind, which may be quantified and expressed 
mathematically. Conveniently, the raw data thus 
generated may be manipulated by the data processing and 
statistical methods described hereinafter, particularly 
normalizing and standardizing the data and fitting the 

10 data to a classification model to determine whether said 
test data reflects the pattern of a particular disease, 
condition or stage thereof. 

The methods described herein may be used to 
identify, monitor or diagnose a disease, condition or 

15 ailment or its stage or progression, for which the 

oligonucleotide probes are informative. 11 Informative" 
probes as described herein, are those which reflect 
genes which have altered expression in the diseases or 
conditions in question, or particular stages thereof. 

20 Probes of the invention may not be sufficiently 

informative for diagnostic purposes when used alone, but 
are informative when used as one of several probes to 
provide a characteristic pattern, e.g. in a set as 
described hereinbefore. 

25 Preferably said probes correspond to genes which 

are systemically affected by said disease, condition or 
stage thereof. Especially preferably said genes, from 
which transcripts are derived which bind to probes of 
the invention, are metabolic or house -keeping genes and 

3 0 preferably are moderately or highly expressed. The 
advantage of using probes directed to moderately or 
highly expressed genes is that smaller clinical samples 
are required for generating the necessary gene 
expression data set, e.g. less than 1ml blood samples - 

35 Furthermore, it has been found that such genes 

which are already being actively transcribed tend to be 
more prone to being influenced, in a positive or 



WO 2004/046382 



PCT/GB2003/005102 



negative way, by new stimuli. In addition, since 
transcripts are already being produced at levels which 
are generally detectable, small changes in those levels 
are readily detectable as for example, a certain 
5 detectable threshold does not need to be reached. 

In preferred methods of the invention, the set of 
probes of the invention are informative for a variety of 
different diseases, conditions or stages thereof. A 
sub- set of the probes disclosed herein may be used for 

10 diagnosis, identification or monitoring a particular 
disease, condition or stage thereof. 

Thus the probes may be used to diagnose or identify 
or monitor any condition, ailment, disease or reaction 
that leads to the relative increase or decrease in the 

15 activity of informative genes of any or all eukaryotic 
or prokaryotic organisms regardless of whether these 
changes have been caused by the influence of bacteria, 
virus, prions, parasites, fungi, radiation, natural or 
artificial toxins, drugs or allergens, including mental 

20 conditions due to stress, neurosis, psychosis or 

deteriorations due to the ageing of the organism, and 
conditions or diseases of unknown cause, providing a 
sub- set of the probes as described herein are 
informative for said disease or condition or stage 
-25 thereof. 

Such diseases include those which result in 
metabolic or physiological changes, such as fever- 
associated diseases such as influenza or malaria. Other 
diseases which may be detected include for example 

30 yellow fever, sexually transmitted diseases such as 

gonorrhea, fibromyalgia, Candida- related complex, cancer 
(for example of the stomach, lung, breast, prostate 
gland, bowel, skin, colon, ovary etc), Alzheimer's 
disease, disease caused by retroviruses such as HIV, 

35 senile dementia, multiple sclerosis and Creutzfeldt- 
Jakob disease to mention a few. 

The invention may also be used to identify patients 
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with psychiatric or psychosomatic diseases such as 
schizophrenia and eating disorders- Of particular 
importance is the use of this method to detect diseases, 
conditions, or stages thereof, which are not readily 
5 detectable by known diagnostic methods, such as HIV 

which is generally not detectable using known techniques 
1 to 4 months following infection. Conditions which may 
be identified include for example drug abuse, such as 
the use of narcotics, alcohol, steroids or performance 
10 enhancing drugs. 

Preferably said disease to be identified or 
monitored is a cancer or a degenerative brain disorder 
(such as Alzheimer's or Parkinson's disease). 

In particular, a set of oligonucleotide probes, 
15 wherein said set comprises at least 10 oligonucleotides 
selected from: 

an oligonucleotide as described in Table 4 or an 
oligonucleotide derived therefrom or an 
oligonucleotide with a complementary sequence, or a 
20 functionally equivalent oligonucleotide, 

may be used for diagnosis or identification or 
monitoring the progression of Alzheimer's disease. 
Similarly Table 2 probes and Table 2 derived probes and 
their functional equivalents may be used to diagnose, 
25 identify or monitor the progression of breast cancer. 

Especially preferably the probes used for breast cancer 
analysis are selected based on their occurrence as set 
forth in Table 3 and as described hereinbefore. 

The diagnostic method may be used alone as an 
30 alternative to other diagnostic techniques or in 

addition to such techniques. For example, methods of 
the invention may be used as an alternative or additive 
diagnostic measure to diagnosis using imaging techniques 
such as Magnetic Resonance Imagine (MRI) , ultrasound 
35 imaging, nuclear imaging or X-ray imaging, for example 
in the identification and/or diagnosis of tumours. 

The methods of the invention may be performed on 
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cells from prokaryotic or eukaryotic organisms which may 
be any eukaryotic organisms such as human beings, other 
mammals and animals, birds, insects, fish and plants, 
and any prokaryotic organism such as a bacteria. 
5 Preferred non-human animals on which the methods of 

the invention may be conducted include, but are not 
limited to mammals, particularly primates, domestic 
animals, livestock and laboratory animals. Thus 
preferred animals for diagnosis include mice, rats, 
10 guinea pigs, cats, dogs, pigs, cows, goats, sheep, 

horses. Particularly preferably the disease state or 
condition of humans is diagnosed, identified or 
monitored. 

As described above, the sample under study may be 

15 any convenient sample which may be obtained from an 

organism. Preferably however, as mentioned above, the 
sample is obtained from a site distant to the site of 
disease and the cells in such samples are not disease 
cells, have not been in contact with such cells and do 

20 not originate from the site of the disease or condition. 
In such cases, although preferably absent, the sample 
may contain cells which do not fulfil these criteria. 
However, since the probes of the invention are concerned 
with transcripts whose expression is altered in cells 

25 which do satisfy these criteria, the probes are 

specifically directed to detecting changes in transcript 
levels in those cells even if in the presence of other, 
background cells . 

It has been found that the cells from such samples 

3 0 show significant and informative variations in the gene 
expression of a large number of genes. Thus, the same 
probe (or several probes) may be found to be informative 
in determinations regarding two or more diseases, 
conditions or stages thereof by virtue of the particular 

35 level of transcripts binding to that probe or the 

interrelationship of the extent of binding to that probe 
relative to other probes. As a' consequence, it is 
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possible to use a relatively small number of probes for 
screening for multiple disorders or diseases . This has 
consequences with regard to the selection of probes, 
discussed in relation to random identification of probes 
5 hereinafter, but also for the use of a single set of 
probes for more than one diagnosis. Table 9 which 
represents preferred probes of the invention discloses 
probes which are informative for both Alzheimer's and 
breast cancer. 

10 Thus, the present invention also provides sets of 

probes for diagnosing, identifying or monitoring two or 
more diseases, conditions or stages thereof, wherein at 
least one of said probes is suitable for said 
diagnosing, identifying or monitoring at least two of 

15 said diseases, conditions or stages thereof, and kits 
and methods of using the same. Preferably at least 5 
probes, e.g. from 5 to 15 probes, are used in at least 
two diagnoses . 

Thus, in a further preferred aspect, the present 

20 . invention provides a method of diagnosis or 

identification or monitoring as described hereinbefore 
for the diagnosis, identification or monitoring of two 
or more diseases, conditions or stages thereof in an 
organism, wherein said test pattern produced in step c) 

25 of the diagnostic method is compared in step d) to at 
least two standard diagnostic patterns prepared as 
described previously, wherein each standard diagnostic 
pattern is a pattern generated for a different disease 
or condition or stage thereof . 

30 Whilst in a preferred aspect the methods of 

assessment concern the development of a gene transcript 
pattern from a test sample and comparison of the same to 
a standard pattern, the elevation or depression of 
expression of certain markers may also be examined by 

35 examining the products of expression and the level of 

those products. Thus a standard pattern in relation to 
* the expressed product may be generated: 
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In such methods the levels of expression of a set 
of polypeptides encoded by the gene to which an 
oligonucleotide of Table 1 or a Table 1 derived 
oligonucleotide, binds, are analysed. 
5 Various diagnostic methods may be used to assess 

the amount of polypeptides (or fragments thereof) which 
are present . The presence or concentration of 
polypeptides may be examined, for example by the use of 
. a binding partner to said polypeptide (e.g. an 

10 antibody) , which may be immobilized, to separate said 
polypeptide from the sample and the amount of 
polypeptide may then be determined. 

"Fragments" of the polypeptides refers to a 
domain or region of said polypeptide, e.g. an antigenic 

15 fragment, which is recognizable as being derived from 

said polypeptide to allow binding of a specific binding 
partner. Preferably such a fragment comprises a 
significant portion of said polypeptide and corresponds 
to a product of normal post -synthesis processing. 

20 Thus in a further aspect the present invention 

provides a method of preparing a standard gene 
transcript pattern characteristic of a disease or 
condition or stage thereof in an organism comprising at 
least the steps of: 

25 a) releasing target polypeptides from a sample of 

one or more organisms having the disease or condition or 
stage thereof; 

b) contacting said target polypeptides with one or 
more binding partners, wherein each binding partner is 

30 specific to a marker polypeptide (or a fragment thereof) 
encoded by the gene to which an oligonucleotide of Table 
1 (or derived from a sequence described in Table 1) 
binds, to allow binding of said binding partners to said- 
target polypeptides, wherein said marker polypeptides 

3 5 are specific for said disease or condition thereof in an 
organism and sample thereof corresponding to the 
organism and sample thereof under investigation; and 
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c) assessing the target polypeptide binding to said 
binding partners to produce a characteristic pattern 
reflecting the level of gene expression of genes which 
express said marker polypeptides, in the sample with the 
5 disease, condition or stage thereof. 

As used herein "target polypeptides" refer to those 
polypeptides present in a sample which are to be 
detected and "marker polypeptides" are polypeptides 
which are encoded by the genes to which Table 1 
10 oligonucleotides or Table 1 derived oligonucleotides 

bind. The target and marker polypeptides are identical 
or at least have areas of high similarity, e.g. epitopic 
regions to allow recognition and binding of the binding 
partner . 

15 "Release" of the target polypeptides refers to 

appropriate treatment of a sample to provide the 
polypeptides in a form accessible for binding of the 
binding partners, e.g. by lysis of cells where these are 
present. The samples used in this case need not 

20 necessarily comprise cells as the target polypeptides 
may be released from cells into the surrounding tissue 
or fluid, and this tissue or fluid may be analysed, e.g. 
urine or blood. Preferably however the preferred 
samples as described herein are used. "Binding 

25 partners" comprise the separate entities which together 
make an affinity binding pair as described above, 
wherein one partner of the binding pair is the target or 
marker polypeptide and the other partner binds 
specifically to that polypeptide, e.g. an antibody. 

30 Various arrangements may be envisaged for detecting 

the amount of binding pairs which form. In its simplest 
form, a sandwich type assay e.g. an immunoassay such as 
an ELISA, may be used in which an antibody specific to 
the polypeptide and carrying a label (as described 

35 elsewhere herein) may be bound to the binding pair (e.g. 
the first antibody : polypeptide pair) and the amount of 
label detected. 
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Other methods as described herein may be similarly 
modified for analysis of the protein product of 
expression rather than the gene transcript and related 
nucleic acid molecules. 
5 Thus a further aspect of the invention provides a 

method of preparing a test gene transcript pattern 
comprising at least the steps of: 

a) releasing target polypeptides from a sample of 
said test organism; 

10 b) contacting said target polypeptides with one or 

more binding partners, wherein each binding partner is 
specific to a marker polypeptide (or a fragment thereof) 
encoded by the gene to which an oligonucleotide of Table 
1 (or derived from a sequence described in Table 1) 

15 binds, to allow binding of said binding partners to said 
target polypeptides, wherein said marker polypeptides 
are specific for said disease or condition thereof in an 
organism and sample thereof corresponding to the 
organism and sample thereof under investigation; and 

20 c) assessing the target polypeptide binding to said 

binding partners to produce a characteristic pattern 
reflecting the level of gene expression of genes which 
express said marker polypeptides, in said test sample. 
A yet further aspect of the invention provides a 

25 method of diagnosing or identifying or monitoring a 
disease or condition or stage thereof in an organism 
comprising the steps of : 

a) releasing target polypeptides from a sample of 
said organism; 

3 0 b) contacting said target polypeptides with one or 

more binding partners, wherein each binding partner is 
specific to a marker polypeptide (or a fragment thereof) 
encoded by the gene to which an oligonucleotide of Table 
1 (or derived from a sequence described in Table 1) 

35 binds, to allow binding of said binding partners to said 
target polypeptides, wherein said marker polypeptides 
are specific for said disease or condition thereof in an 
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organism and sample thereof corresponding to the 
organism and sample thereof under investigation; and 

c) assessing the target polypeptide binding to said 
binding partners to produce a characteristic pattern 

5 reflecting the level of gene expression of genes which 
express said marker polypeptides in said sample; and 

d) comparing said pattern to a standard diagnostic 
pattern prepared as described hereinbefore using a 
sample from an organism corresponding to the organism 

10 and sample under investigation to determine the degree 
of correlation indicative of the presence of said 
disease or condition or a stage thereof in the organism 
under inve s t igat ion . 

The methods of generating standard and test 

15 patterns and diagnostic techniques rely on the use of 

informative oligonucleotide probes to generate the gene 
expression data. In some cases it will be necessary to 
select these informative probes for a particular method, 
e.g. to diagnose a particular disease, from a selection 

20 of available probes, e.g. the probes described 

hereinbefore (the Table 1 oligonucleotides, the Table 1 
derived oligonucleotides, their complementary sequences 
and functionally equivalent oligonucleotides) . The 
following methodology describes a convenient method for 

25 identifying such informative probes, or more 

particularly how to select a suitable sub- set of probes 
from the probes described herein. 

Probes for the analysis of a particular disease or 
condition or stage thereof, may be identified in a 

30 number of ways known in the prior art, including by 

differential expression or by library subtraction (see 
for example W098/49342) . As described hereinafter, in 
view of the high information content of most 
transcripts, as a starting point one may also simply 

35 analyse a random sub- set of mRNA or cDNA species and 

pick the most informative probes from that sub-set. The 
following method describes the use of immobilized 
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oligonucleotide probes (e.g. the probes of the 
invention) to which mRNA (or related molecules) from 
different samples is bound to identify which probes are 
the most informative to identify a particular type of 
5 sample , e.g. a disease sample. 

The immobilized probes can be derived from various 
unrelated or related organisms; the only requirement is 
that the immobilized probes should bind specifically to 
their homologous counterparts in test organisms. Probes 
10 can also be derived from commercially available or 

public databases and immobilized on solid supports or, 
as mentioned above, they can be randomly picked and 
isolated from a cDNA library and immobilized on a solid 
support . 

15 The length of the probes immobilised on the solid 

support should be long enough to allow for specific 
binding to the target sequences . The immobilised probes 
can be in the form of DNA, RNA or their modified 
products or PNAs (peptide nucleic acids) . Preferably, 

20 the probes immobilised should bind specifically to their 
homologous counterparts representing highly and 
moderately expressed genes in test organisms. 
Conveniently the probes which are used are the probes 
described herein. 

25 The gene expression pattern of cells in biological 

samples can be generated using prior art techniques such 
as microarray or macroarray as described below or using 
methods described herein. Several technologies have now 
been developed for monitoring the expression level of a 

3 0 large number of genes simultaneously in biological 

samples, such as, high-density oligoarrays (Lockhart et 
al., 1996, Nat. Biotech., 14, pl675-1680) , cDNA 
microarrays (Schena et al, 1995, Science, 270, p467-470) 
and cDNA macroarrays (Maier E et al . , 1994, Nucl . Acids 

35 Res., 22, p3423-3424; Bernard et al . , 1996, Nucl. Acids 
Res., 24, pl435-1442) . 

In high-density oligoarrays and cDNA microarrays, 
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hundreds and thousands of probe oligonucleotides or 
cDNAs, are spotted onto glass slides or nylon membranes, 
or synthesized on biochips. The mRNA isolated from the 
test and reference samples are labelled by reverse 
5 transcription with a red or green fluorescent dye, 

mixed, and hybridised to the microarray. After washing, 
the bound fluorescent dyes are detected by a laser, 
producing two images, one. for each dye. The resulting 
ratio of the red and green spots on the two images 

10 provides the information about the changes in expression 
levels of genes in the test and reference samples . 
Alternatively, single channel or multiple channel 
microarray studies can also be performed. 

In cDNA macroarray, different cDNAs are spotted on 

15 a solid support such as nylon membranes in excess in 

relation to the amount of test mRNA that can hybridise 
to each spot . mRNA isolated from test samples is radio- 
labelled by reverse transcription and hybridised to the 
immobilised probe cDNA. After washing, the signals 

20 associated with labels hybridising specifically to 

immobilised probe cDNA are detected and quantified. The 
data obtained in macroarray contains information about 
the relative levels of transcripts present in the test 
samples. Whilst macroarrays are only suitable to 

25 monitor the expression of a limited number of genes, 
microarrays can be used to monitor the expression of 
several thousand genes simultaneously and is, therefore, 
a preferred choice for large-scale gene expression 
studies . 

30 A macroarray technique for generating the gene 

expression data set has been used to illustrate the 
probe identification method described herein. For this 
purpose, mRNA is isolated from samples of interest and 
used to prepare labelled target molecules, e.g. mRNA or 

35 cDNA as described above. The labelled target molecules 
are then hybridised to probes immobilised on the solid 
support. Various solid supports can be used for the * 
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purpose, as described previously. Following 
hybridization, unbound target molecules are removed and 
signals from target molecules hybridizing to immobilised 
probes quantified. If radio labelling is performed, 
5 Phospho Imager can be used to generate an image file that 
can be used to generate a raw data set. Depending on 
the nature of label chosen for labelling the target 
molecules, other instruments can also be used, for 
example, when fluorescence is used for labelling, a 

10 Fluorolmager can be used to generate an image file from 
the hybridised target molecules. 

The raw data corresponding to mean intensity, 
median intensity, or volume of the signals in each spot 
can be acquired from the image file using commercially 

15 available software for image analysis. However, the 
acquired data needs to be corrected for background 
signals and normalized prior to analysis, since, several 
factors can affect the quality and quantity of the 
hybridising signals. For example, variations in the 

20 quality and quantity of mRNA isolated from sample to 

. sample, subtle variations in the efficiency of labelling 
target molecules during each reaction, and variations in 
the amount of unspecif ic binding between different 
macroarrays can all contribute to noise in the acquired 

25 data set that must be corrected for prior to analysis. 

Background correction can be performed in several 
ways . The lowest pixel intensity within a spot can be 
used for background subtraction or the mean or median of 
the line of pixels around the spots' outline can be used 

30 for the purpose. One can also define an area 

representing the background intensity based on the 
signals generated from negative controls and use the 
average intensity of this area for background 
subtraction. 

35 The background corrected data can then be 

transformed for stabilizing the variance in the data 
structure and normalized for the differences in probe 
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intensity. Several transformation techniques have been 
described in the literature and a brief overview can be 
found in Cui, Kerr and Churchill 
http : / /www . j ax . org/research/churchill/research/ 
5 expression/Cui-Transf orm.pdf ) . Normalization can be 
performed by dividing the intensity of each spot with 
the collective intensity, average intensity or median 
intensity of all the spots in a macroarray or a group of 
spots in a macroarray in order to obtain the relative 

10 intensity of signals hybridising to immobilised probes 
in a macroarray. Several methods have been described 
for normalizing gene expression data (Richmond and 
Somerville, 2000, Current Opin. Plant Biol., 3, pl08- 
116; Finkelstein et al . , 2001, In "Methods of Microarray 

15 Data Analysis. Papers from CAMDA, Eds. Lin & Johnsom, 
Kluwer Academic, p57-68; Yang et al . , 2001, In "Optical 
Technologies and Informatics", Eds. Bittner, Chen, 
Dorsel & Dougherty, Proceedings of SPIE, 4266, pl41-152; 
Dudoit et al, 2000, J. Am. Stat. Ass., 97, p77-87; Alter 

20 et al 2000, supra; Newton et al . , 2001, J. Comp. Biol., 
8, p37-52) . Generally, a scaling factor or function is 
first calculated to correct the intensity effect and 
then used for normalising the intensities. The use of 
external controls has also been suggested for improved 

25 normalization. 

One other major challenge encountered in 
large-scale gene expression analysis is that of 
standardization of data collected from experiments 
performed at different times . We have observed that 

30 gene expression data for samples acquired in the same 
experiment can be efficiently compared following 
background correction and normalization. However, the 
data from samples acquired in experiments performed at 
different times requires further standardization prior 

35 to analysis. This is because subtle differences in 

experimental parameters between different experiments, 
for example, differences in the quality and quantity of 
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mRNA extracted at different times, differences in time 
used for target molecule labelling, hybridization time 
or exposure time, can affect the measured values. Also, 
factors such as the nature of the sequence of 
5 transcripts under investigation (their GC content) and 
their amount in relation to the each other determines 
how they are affected by subtle variations in the 
experimental processes. They determine, for example, 
how efficiently first strand cDNAs, corresponding to a 
10 particular transcript, are transcribed and labelled 

during first strand synthesis, or how efficiently the 
corresponding labelled target molecules bind to their 
complementary sequences during hybridization. Batch to 
batch difference in the printing process is also a major 
15 factor for variation in the generated expression data. 

Failure to properly address and rectify for these 
influences leads to situations where the differences 
between the experimental series may overshadow the main 
information of interest contained in the gene expression 
20 data set, i.e. the differences within the combined data 
from the different experimental series. Figure 1 
provides one such example showing a classification based 
on Principal Component Analysis (PCA) of combined data 
from two experimental series where the main goal is to 
25 distinguish between Alzheimer/non -Alzheimer patients. 

PCA (also known as singular value decomposition) is 
a technique for studying interdependencies and 
underlying relationships of a set of variables. The 
data are modelled in terms of a few significant factors 
30 or principal components (PC f s), plus residuals. The 
PC's contain the main phenomena and define the 
systematic variability present in the data, while the 
residuals represent the variability interpreted as 
noise. Details on PCA can be found in Jollife (1986, 
35 Principal Component Analysis, Springer-Verlag, NY) , and 
Jackson (1991, A User's Guide to Principal Components, 
Wiley, NY) . The results of Figure 1 show that two 
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clusters are formed representing the data from two 
experimental series rather than the 

Alzheimer/non-Alzheimer differentiation. There were 
eight samples in common between the two series of 
5 experiments, which ideally should have fallen on top of, 
or in near proximity to, each other if appropriately 
standardized. 

We have now found that gene expression data between 
different experiments can be efficiently standardized by 

10 including a subset of samples from one experimental 
series in the next experimental series and using a 
direct standardization method (DS) , originally described 
by Wang and Kowalski (Anal. Chem. , 1991, 63, p2750 and 
J. Chemometrics, 1991, 5, pl29-145) . Although the 

15 method of DS is well known in the field of analytical 
chemistry, it remains undescribed and unused in the 
field of gene expression data analysis. 

In DS, the secondary data representing for example 
experimental series 2 (secondary measurements, R 2 ) are 

20 corrected to match the data measured on the primary 

measurements representing data from series 1 (Ri) , while 
the calibration model remains unchanged. In DS, 
response matrices for both experimental series are 
related to each other by a transformation matrix F, i.e. 

25 

R x a R 2 F (1) 

Where F is a square matrix dimensioned gene by 
gene. From (1) , the transformation matrix is calculated 
as : 

30 

F= R/Rx (2) 

The transformation matrix F in equation (2) is 
calculated using a relatively small subset of samples 
which are measured on both the master primary and the 
35 secondary series of data. 

Finally, the response of the unknown sample 
measured on the secondary series r T 2/U n/ is standardized 
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to the response vector r T lrUn expected from the primary 
series 

5 

r l,un = r T21,un^ (3) 

From the preceding equation it can be seen that the 
column i of the transformation matrix contains the 

10 multiplication factors for a set of genes measured in 
the secondary series to obtain the intensity at spot i 
of the corrected series. 

The number of samples that are repeated in the 
experimental series, R x and R 2 , should be equal to their 

15 ranks, which in this case is equal to the number of 
principal components retained for explaining the 
variation in the R x and R 2 . For example, if three 
principal components are retained for explaining the 
variation in the data set, a minimum of three samples 

20 should be repeated between R x and R 2 . The samples that 
should be repeated between different series should 
ideally be those that exhibit high leverages in the gene 
expression pattern. At times, two samples may suffice, 
while at other times, more than two samples should be 

25 ideally be included for good representativity . In some 
cases, the samples selected can be the same in all the 
experimental series to be compared (reference samples) , 
while in other cases, representative samples can be 
selected sequentially by analyzing the expression 

30 pattern after each experiment. The selected samples 
with high leverages are then included in the next 
experimental series. The results of using Direct 
Standardization are shown in Figure 1. 

Another approach for normalizing and standardizing 

35 the gene expression data set is to hybridize each DNA 

array with target molecules prepared from a test sample 
and an equal amount of labelled target molecules . 
prepared from representative reference samples. In 
order to measure the intensity of labelled target 
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molecules hybridizing to the immobilized probes it is 
necessary that the labelled molecules are prepared from 
test and reference samples using different labels, for 
example, different fluorescent dyes can be used for 
5 preparing the labelled material. The labelled molecules 
prepared from reference samples can be added to the 
hybridization solution together with the labelled 
material prepared from test samples. A data file from 
each array representing the expression pattern of 

10 different genes in the test sample and reference samples 
can then be obtained, normalized and standardized by the 
direct standardization method as described above. An 
instant advantage of including the differentially 
labelled target molecules from reference samples during 

15 hybridization is that it enables an efficient comparison 
of new test samples to the data sets already stored in a 
database . 

Monitoring the expression of a large number of 
genes in several samples leads to the generation of a 

20 large amount of data that is too complex to be easily 
interpreted. Several unsupervised and supervised 
multivariate data analysis techniques have already been 
shown to be useful in extracting meaningful biological 
information from these large data sets. Cluster 

25 analysis is by far the most commonly used technique for 
gene expression analysis, and. has been performed to 
identify genes that are regulated in a similar manner, 
and or identifying new/unknown tumour classes using gene 
expression profiles (Eisen et al., 1998, PNAS, 95, 

30 pl4863-14868, Alizadeh et al . 2000, supra, Perou et al . 
2000, Nature, 406, p747-752; Ross et al, 2000, Nature 
Genetics, 24(3), p227-235; Herwig et al . , 1999, Genome 
Res., 9, pl093-1105; Tamayo et al , 1999, Science, PNAS, 
96, p2907-2912) . 

35 In the clustering method, genes are grouped into 

functional categories (clusters) based on their 
expression profile, satisfying two criteria: homogeneity 
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- the genes in the same cluster are highly similar in 
expression to each other; and separation - genes in 
different clusters have low similarity in expression to 
each other. 

5 Examples of various clustering techniques that have 

been used for gene expression analysis include 
hierarchical clustering (Eisen et al., 1998, supra; 
Alizadeh et al. 2000, supra; Perou et al. 2000, supra; 
Ross et al, 2000, supra) , K-means clustering (Herwig et 
10 al . , 1999, supra; Tavazoie et al, 1999, Nature Genetics, 
22(3), p. 281-285), gene shaving (Hastie et al., 2000, 
Genome Biology, 1(2), research 0003.1-0003.21), block 
clustering (Tibshirani et al., 1999, Tech repot Univ 
Stanford.) Plaid model (Lazzeroni, 2002, Stat. Sinica, 
15 12, p61-86) , and self -organizing maps (Tamayo et al. 

1999, supra) . Also, related methods of multivariate 
statistical analysis, such as those using the singular 
value decomposition (Alter et al . , 2000, PNAS, 97(18), 
pl0101-10106 ; Ross et al . 2000, supra) or 
20 multidimensional scaling can be effective at reducing 
the dimensions of the objects under study. 

However, methods such as cluster analysis and 
singular value decomposition are purely exploratory and 
only provide a broad overview of the internal structure 
25 present in the data. They are unsupervised approaches 

in which the available information concerning the nature 
of the class under investigation is not used in the 
analysis. Often, the nature of the biological 
perturbation to which a particular sample has been 
3 0 subjected is known. For example, it is sometimes known 
whether the sample whose gene expression pattern is 
being analysed derives from a diseased or healthy 
individual. In such instances, discriminant analysis 
can be used for classifying samples into various groups 
35 based on their gene expression data. 

In such an analysis one builds the classifier by 
training the data that is capable of discriminating 
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between member and non-members of a given class . The 
trained classifier can then be used to predict the class 
of unknown samples. Examples of discrimination methods 
that have been described in the literature include 
5 Support Vector Machines (Brown et al, 2000, PNAS, 97, 
p262-267) , Nearest Neighbour (Dudoit et al., 2000, 
supra). Classification trees (Dudoit et al., 2000, 
supra), Voted classification (Dudoit et al., 2000, 
supra), Weighted Gene voting (Golub et al. 1999, supra), 

10 and Bayesian classification (Keller et al . 2000, Tec 
report Univ of Washington) . Also a technique in which 
PLS (Partial Least Square) regression analysis is first 
used to reduce the dimensions in the gene expression 
data set followed by classification using logistic 

15 discriminant analysis and quadratic discriminant 
analysis (LiD and QDA) has recently been described 
(Nguyen & Rocke, 2002, Bioinf ormatics, 18, p39-50 and 
1216-1226) . 

A challenge that gene expression data poses to 

20 classical discriminatory methods is that the number of 
genes whose expression are being analysed is very large 
compared to the number of samples being analysed. 
However in most cases only a small fraction of these 
genes are informative in discriminant analysis problems. 

25 Moreover, there is a danger that the noise from 

irrelevant genes can mask or distort the information 
from the informative genes. Several methods have been 
suggested in literature to identify and select genes 
that are informative in microarray studies, for example, 

30 t-statistics (Dudoit et al, 2002, J. Am. Stat. Ass., 97, 
p77-87) , analysis of variance (Kerr et al . , 2000, PNAS, 
98, p8961-8965) , Neighbourhood analysis (Golub et al, 
1999, supra) , Ratio of between groups to within groups 
sum of squares (Dudoit et al . , 2002, supra), Non 

35 parametric scoring (Park et al . , 2002, Pacific Symposium 
on Biocomputing, p52-63) and Likelihood selection 
(Kelleret al . , 2000, supra). 
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In the methods described herein the gene expression 
data that has been normalized and standardized is 
analysed by using Partial Least Squares Regression 
(PLSR) . Although PLSR is primarily a method used for 
5 regression analysis of continuous data (see Appendix A) , 
it can also be utilized as a method for model building 
and discriminant analysis using a dummy response matrix 
based on a binary coding. The class assignment is based 
on a simple dichotomous distinction such as breast 

10 cancer (class 1) / healthy (class 2) , or a multiple 

distinction based on multiple disease diagnosis such as 
breast cancer (class 1) / Alzheimer (class 2) / healthy 
(class 3) . The list of diseases for classification can 
be increased depending upon the samples available 

15 corresponding to other diseases or conditions or stages 
thereof . 

PLSR applied as a classification method is referred 
to as PLS-DA (DA standing for Discriminant analysis) . 
PLS-DA is an extension of the PLSR algorithm in which 

20 the Y-matrix is a dummy matrix containing n rows 

. (corresponding to the number of samples) and K columns 
(corresponding to the number of classes) . The Y-matrix 
is constructed by inserting 1 in the kth column and -1 
in all the other columns if the corresponding ith object 

25 of X belongs to class k. By regressing Y onto X, 

classification of a new sample is achieved by selecting 
the group corresponding to the largest component of the 
fitted, j>(x) = UMx), #>(x),..., A(x)). Thus, in a 
-1/1 response matrix, a prediction value below 0 means 

30 that the sample belongs to the class designated as -1, 

while a prediction value above 0 implies that the sample 
belongs to the class designated as 1. 

An advantage of PLSR -DA is that the results 
obtained can be easily represented in the form of two 

35 different plots, the score and loading plots. Score 
plots represent a projection of the samples onto the 
principal components and shows the* distribution of the 
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samples in the classification model and their 
relationship to one another. Loading plots display 
correlations between the variables present in the data 
set . 

5 It is usually recommended to use PLS-DA as a 

starting, point for the classification problem due to its 
ability to handle collinear data, and the property of 
PLSR as a dimension reduction technique . Once this 
purpose has been satisfied, it is possible to use other 

10 methods such as Linear discriminant analysis, LDA, that 
has been shown to be effective in extracting further 
information, Indahl et al. (1999, Chem. and Intell . Lab. 
Syst . , 49, pl9-31) . This approach is based on first 
decomposing the data using PLS-DA, and then using the 

15 scores vectors (instead of the original variables) as 
input to LDA. Further details on LDA can be found in 
Duda and Hart (Classification and Scene Analysis, 1973, 
Wiley, USA) . 

The next step following model building is of model 

20 validation. This step is considered to be amongst the 
most important aspects of multivariate analysis, and 
tests the "goodness" of the calibration model which has 
been built. In this work, a cross validation approach 
has been used for validation. In this approach, one or 

25 a few samples are kept out in each segment while the 
model is built using a full cross-validation on the 
basis of the remaining data. The samples left out are 
then used for prediction/classification. Repeating the 
simple cross-validation process several times holding 

30 different samples out for each cross-validation leads to 
a so-called double cross-validation procedure. This 
approach has been shown to work well with a limited 
amount of data, as is the case in some of the Examples 
described here. Also, since the cross validation step 

35 is repeated several times the dangers of model bias and 
overfitting are reduced. 

Once a calibration model has been built and 
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validated, genes exhibiting an expression pattern that 
is most relevant for describing the desired information 
in the model can be selected by techniques described in 
the prior art for variable selection, as mentioned 
5 elsewhere. Variable selection will help in reducing the 
final model complexity, provide a parsimonious model, 
and thus lead to a reliable model that can be used for 
prediction. Moreover, use of fewer genes for the 
purpose of providing diagnosis will reduce the cost of 
10 the diagnostic product. In this way informative probes 
which would bind to the genes of relevance may be 
identified. 

We have found that after a calibration model has 
been built, statistical techniques like Jackknife 

15 (Effron, 1982, The Jackknife, the Bootstrap and other 
resampling plans. Society for Industrial and Applied 
mathematics, Philadelphia, USA) , based on resampling 
methodology, can be efficiently used to select or 
confirm significant variables (informative probes) . 

20 The approximate uncertainty variance of the PLS 

regression coefficients B can be estimated by: 

M 

S 2 B = £ ((B-B m )gr) 2 
25 m=l 

where 

S 2 B = estimated uncertainty variance of B; 

B = the regression coefficient at the cross validated 
3 0 rank A using all the N objects; 

B m - the regression coefficient at the rank A using all 

objects except the object (s) left out in cross 

validation segment m; and 

g = scaling coefficient (here: g=l) . 
35 In our approach, Jackknife has been implemented 

together with cross-validation. For each variable the 

difference between the B-coef f icients Bj in a 
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cross -validated sub-model and B tot for the total model is 
first calculated. The sum of the squares of the 
differences is then calculated in all sub-models to 
obtain an expression of the variance of the B t estimate 
5 for a variable. The significance of the estimate of Bi 
is calculated using the t-test. Thus, the resulting 
regression coefficients can be presented with 
uncertainty limits that correspond to 2 Standard 
Deviations, and from that significant variables are 

10 detected. 

No further details as to the implementation or use 
of this step are provided here since this has been 
implemented in commercially available software, The 
Unscrambler, CAMO ASA, Norway. Also, details on 

15 variable selection using Jackknife can be found in 

Westad & Martens (2000, J. Near Inf. Spectr., 8, pll7- 
124) . 

The following approach can be used to select 
informative probes from a gene expression data set: 
20 a) keep out one unique sample (including its 

repetitions if present in the data set) per cross 
validation segment ; 

b) build a calibration model (cross validated 
segment) on the remaining samples using PLSR-DA; 
25 c) select the significant genes for the model in 

step b) using the Jackknife criterion ; 

d) repeat the above 3 steps until all the unique 
samples in the data set are kept out once (as described 
in step a) . For example, if 75 unique samples are 

3 0 present in the data set, 75 different calibration models 
are built resulting in a collection of 75 different sets 
of significant probes; 

e) select the most significant ..variables using 
the frequency of occurrence criterion in^ the generated 

3 5 sets of significant probes in step d) . For example, a 
set of probes appearing in all sets (100%) are more 
informative than. probes appearing in. only 50% of the 
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generated sets in step d) . 

Once the informative probes for a disease have been 
selected, a final model is made and validated. The two 
most commonly used ways of validating the model are 
5 cross-validation (CV) and test set validation. In 

cross-validation, the data is divided into k subsets. 
The model is then trained k times, each time leaving out 
one of the subsets from training, but using only the 
omitted subset to compute error criterion, RMSEP (Root 
10 Mean Square Error of Prediction) . If k equals the 
sample size, this is called "leave-one-out " cross- 
validation. The idea of leaving one or a few samples 
.out per validation segment is valid only in cases where 
the covariance between the various experiments is zero. 
15 Thus, one sample at-a-time approach can not be justified 
in situations containing replicates since keeping only 
one of the replicates out will introduce a systematic 
bias in our analysis. The correct approach in this case 
will be to leave out all replicates of the same samples 
20 at a time since that would satisfy assumptions of zero 
covariance between the CV-segments. 

.The second approach for model validation is to use 
a separate test-set for validating the calibration 
model . This requires running a separate set of 
25 experiments to be used as a test set.. This is the 
preferred approach given that real test data are 
available. 

The final model is then used to identify a disease, 
condition or stage thereof in test samples. For this 
30 purpose, expression data of selected informative genes 
is generated from test samples and then the final model 
is used to determine whether a sample belongs to a 
diseased or non-diseased class or has a condition or 
stage thereof. 

35 Thus viewed from a yet further aspect the present 

invention provides a method of identifying probes useful 
for diagnosing or identifying or monitoring a disease or 
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condition or stage thereof in an organism, comprising 
the steps of: 

a) immobilizing a set of oligonucleotide probes, 
preferably as described hereinbefore, on a 

5 solid support; 

b) isolating mRNA from a sample of a normal 
organism (normal sample) , which may optionally 
be reverse transcribed to cDNA; 

c) isolating mRNA from a sample from an organism, 
10 corresponding to the sample and organism of 

' step (b) , which is known to have said disease 
or condition or a stage thereof (diseased 
sample) , which may optionally be reverse 
transcribed to cDNA; 

15 d) . hybridizing the mRNA or cDNA of steps (b) and 

(c) to said set of immobilized oligonucleotide 
probes of step (a) ; and 
e) assessing the amount of mRNA or cDNA 

hybridizing to each of said oligonucleotide 

20 probes to determine the level of gene 

expression of genes to which said 
oligonucleotide probes bind in said normal and 
diseased samples to generate a gene expression 
data set for each sample; 

25 f) normalizing and standardizing said data set of 

step (e) ; 

g) constructing a calibration model for 
classification, preferably using the 
statistical techniques Partial Least Squares 

30 Discriminant Analysis (PLS-DA) and Linear 

Discriminant Analysis (LDA) ; 

h) performing JackKnife analysis and identifying 
those oligonucleotide probes which are 
required for classification of said disease 

35 and normal samples into their respective 

groups • 

Preferably a model for . classification purposes is 
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generated by using the data relating to the probes 
identified according to the above described method. 
Preferably the sample is as described previously. 
Preferably the oligonucleotides which are immobilized in 
5 step (a) are randomly selected as described below or are 
the probes as described hereinbefore. Such 
oligonucleotides may be of considerable length, e.g. if 
using cDNA (which is encompassed within the scope of the 
term "oligonucleotide") . The identification of such 
10 cDNA molecules as useful probes allows the development 
of shorter oligonucleotides which reflect the 
specificity of the cDNA molecules but are easier to 
manufacture and manipulate. 

The above described model may then be used to 
15 generate and analyse data of test samples and thus may 

be used for the diagnostic methods of the invention. In 
such methods the data generated from the test sample 
provides the gene expression data set and this is 
normalized and standardized as described above. This is 
20 then fitted to the calibration model described above to 
provide classification. 

The method described herein can also be used to 
simultaneously select informative probes for several 
related and unrelated diseases or conditions. Depending 
25 upon which diseases or conditions have been included in 
the calibration or training set, informative probes can 
be selected for the said diseases or conditions. The 
informative probes selected for one disease or condition 
may or may not be similar to the informative probes 
30 selected for another disease or condition of interest. 
It is the pattern with which the selected genes are 
expressed in relation to each other during a disease, 
condition, or stage thereof, that determines whether or 
not they are informative for the disease, condition or 
35 stage thereof. 

In other words, informative genes are selected 
based on how their expression correlates with the 
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expression of other selected informative genes under the 
influence of responses generated by the disease , 
condition or stage thereof under investigation. In 
examples 1 and 2 provided hereinafter, 13 9 informative 
5 probes were selected for breast cancer diagnosis and 182 
probes were selected for Alzheimer's disease diagnosis 
by training the gene expression data set of genes 
representing 1435 or 758 randomly picked cDNA clones for 
breast cancer/non breast cancer samples , or 
10 Alzheimer/non-Alzheimer samples, respectively. Among 
the probes selected for breast cancer and Alzheimer, 
about 10 probes were informative both for breast cancer 
and Alzheimer disease diagnosis. 

For the purpose of isolating informative probes or 
15 identifying several related and unrelated diseases, 

conditions and stages thereof simultaneously, the gene 
expression data set must contain the information on how 
genes are expressed when the subject has a particular 
disease, condition or stage thereof under investigation. 
20 The data set is generated from a set of healthy or 

diseased samples, where a particular sample may contain 
the information of only one disease, condition or stages 
thereof or may also contain information about multiple 
diseases, conditions or stages thereof. For example, if 
25 the isolation of informative probes for Alzheimer 

disease, breast cancer and diabetes is sought, whole 
blood samples can be obtained from an Alzheimer patient 
who has breast cancer and diabetes. Hence, the method 
also teaches an efficient experimental design to reduce 
30 the number of samples required for isolating informative 
probes by selecting samples representing more than one 
disease, condition or stage thereof. 

As mentioned previously, in view of the high 
information content of most transcripts, the 
35 identification and selection of informative probes for 
use in diagnosing, monitoring or identifying a 
particular disease, .condition or stage thereof may be 
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dramatically simplified. Thus the pool of genes from 
which a selection may be made to identify informative 
probes may be radically reduced. 

Unlike, in prior art technologies where informative 
5 probes are selected from a population of thousands of 
genes that are being expressed in a cell, like in 
microarray, in the method described herein, the 
informative probes are selected from a limited number of 
randomly obtained genes. For example, from a population 

10 of 1435 cDNA clones, randomly picked from a human whole 
blood cDNA library, we were able to select 139 
informative probes for breast cancer diagnosis (see 
Example 1 and Table 2) . 

Thus in a preferred aspect of the above mentioned 

15 method of identifying probes useful for diagnosing or 
identifying or monitoring a disease or condition or 
stage thereof in an organism, said set of 
oligonucleotides which are immobilized in step (a) are 
randomly selected from a larger set of oligonucleotides, 

20 e.g. from a cDNA library or other oligonucleotide pool, 
which may be, but is preferably not selected from the 
set provided herein. Preferably said larger set 
comprises oligonucleotides which correspond to 
moderately or highly expressed genes. Thus preferably 

25 in methods of the invention, the set of oligonucleotides 
according to the invention are replaced with a set of 
oligonucleotides which are randomly selected, e.g. from 
commercially available oligonucleotide or cDNA 
libraries . 

30 As referred to herein "random" refers to selection 

which is not biased based on the extent of information 
carried by the transcripts in relation to the disease, 
condition or organism under study, ie. without bias 
towards their likely utility as informative probes. 

35 Whilst a random selection may be made from a pool of 
transcripts (or related products) which have been 
biased, e.g. to highly or. moderately expressed 
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transcripts, preferably random selection is made from a 
pool of transcripts not biased or selected by a 
sequence -based criterion. The larger set may therefore 
contain oligonucleotides corresponding to highly and 
5 moderately expressed genes , or alternatively, may be 
enriched for those corresponding to the highly and 
moderately expressed genes. 

Random selection from highly and moderately 
expressed genes can be achieved in a wide variety of 

10 ways. A strategy used in this work, but not limiting in 
itself involves randomly picking a significant number of 
cDNA clones from a cDNA library constructed from a 
biological specimen under investigation. Since, in a 
cDNA library, the cDNA clones corresponding to 

15 transcripts present in high or moderate amount are more 
frequently present than transcripts corresponding to 
cDNA present in low amount, the former will tend to be 
picked up more frequently than the latter. A pool of 
cDNA enriched for' those corresponding to highly and 

20 moderately expressed genes can be isolated by this 
approach. 

To identify genes that are expressed in high or 
moderate amount among the isolated population for use in 
methods of the invention, the information about the 

25 relative level of their transcripts in samples of 
interest can be generated using several prior art 
techniques. Both non- sequence based methods, such as 
differential display or RNA fingerprinting, and 
sequence -based methods such as microarrays or 

30 macroarrays can be used for the purpose. Alternatively, 
specific primer sequences for highly and moderately 
expressed genes can be designed and methods such as 
quantitative RT-PCR can be used to determine the levels 
of highly and moderately expressed genes. Hence, a 

35 skilled practitioner may use a variety of techniques 

which are known in the art for determining the relative 
level of mRNA in a biological sample. 
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Especially preferably the sample for the isolation 
of mRNA in the above described method is as described 
previously and is preferably not from the site of 
disease and the cells in said sample are not disease, 
5 cells and have not contacted disease cells. 

The following examples are given by way of 
illustration only in which the Figures referred to are 
as follows: 

Figure 1 shows the effect of Direct Standardization 
10 (DS) on the Alzheimer data measured in two different 
series of experiments in which AD denotes Alzheimer's 
samples and A, B are non -Alzheimer 1 s samples. The 
samples in both series have been labelled systematically 
as (xx_7/xx_8) , whereas the corrected samples from 
15 series 8 (in b,c,d) have been labelled as (xx_c) , thus, 
for example, AD2-7 denotes Alzheimer disease sample 
number 2 in experiment series 7. The circled spots 
represent the samples chosen as the transfer samples. 
The connecting lines in figures b,c,d show the proximity 
20 of the replicated samples after applying DS. The dashed 
lines in figures a,c,d represent the decision boundary 
separating the classes. These lines have not been drawn 
on the basis of any statistical criteria, but serve the 
purpose of visually separating the classes. All the 
25 four figures show scores plot (PC1-PC2) from PCA 

analysis based on (a) non- standardized data, (b) scores 
plot after direct standardization using 3 transfer 
samples, (c) scores plot after direct standardization 
using 4 transfer sample, (d) scores plot after direct 
30 standardization using 8 transfer samples ; 

Figure 2 shows the projection of normal (including 
benign) and breast cancer samples onto a classification 
model generated by PLSR-DA- using the data of 44 
informative genes, in which PC is the principal 
35 components and N and C are normal and breast cancer 
samples, respectively; 

Figure 3 shows the projection of . individuals with 
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* and without Alzheimer's disease onto a classification 
model generated by PLSR-DA using 182 informative genes; 

Figures 4. 6 and 8 show projection plots as Figure 
2 in which the classification model is generated using 
5 719, 111 and 345 cDNAs , respectively, wherein PC is the 
principal components, N denotes normal and B denotes 
breast cancer samples; 

FjLgures 5 . 7 and 9 show prediction plots based on 3 
principal components using the data of 719, 111 and 345 
10 cDNAs , respectively; 

Figure 10 shows a projection plot as Figure 3 in 
which the classification model is generated using 520 
cDNAs; and 

Figure 11 is the prediction plot corresponding to 
15 Figure 10 . 

Example 1: Diagnosis of Breast Cancer; 
Methods 

20 

Whole blood was obtained from the arms of breast cancer 
patients and patients with benign tumours (Ulleval and 
Haukland hospitals in Norway) . All of the patients with 
breast cancer had a malignant tumour of the breast 
25 (disease samples) . Healthy blood was collected from the 
above two hospitals, or collected at a Health station at 
As, Norway or at DiaGenic AS, Norway, from the arms of 
female donors with no reported signs of breast cancer. 
The blood from healthy individuals or with benign 

30 tumours comprise the normal samples. The blood was 
either collected in tubes containing EDTA and stored 
immediately at -80°C or was collected in PAXgene tubes 
and stored for 12-24 hours at room temperature before 
finally storing them at -80°C before use. Further 

35 details of the breast cancer and benign tumour patients 
from which blood was taken is provided in Table 5. 
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mRNA was isolated from the blood of the 29 breast cancer 
patients and 46 normal donors and used to prepare 
labelled probes by reverse transcribing in the presence 
of a 33 P-dATP. The first strand cDNA of the normal and 
5 diseased samples was bound, separately to 1435 cDNA 

clones immobilized on a solid support (nylon membrane) . 
These cDNA clones were randomly picked, without any 
prior knowledge of their gene sequences, from a cDNA 
library constructed using whole blood of 550 healthy 
10 individuals (Clontech, Palo Alto, USA) . These methods 
were conducted as follows . 

For amplification of inserts, bacterial clones were 
grown in microtiter plates containing 150 pi LB with 50 

15 fig /ml carbenicillin, and incubated overnight with 

agitation at 37°C. To lyse the cells, 5 /il of each 
culture were diluted with 50 /xl H20 and incubated for 12 
min. at 95 °C. Of this mixture, 2 jtxl were subjected to a 
PGR reaction using 20 pmoles of M13 forward and reverse 

20 primer in presence of 1.5 mM MgCl 2 . PCR reactions were 
performed with the following cycling protocol: 4 min. at 
95 °C, followed by 25 cycles of 1 min. at 94 °C, 1 min. at 
60 °C and 3 min. at 72 °C either in a RoboCycler® 
Temperature Cycler (Stratagene, La Jolla, USA) or DNA 

25 Engine Dyad Peltier Thermal Cycler (MJ Research Inc . , 

Waltham, USA) . The amplified products were denatured by 
incubating with NaOH (0.2 M, final concentration) for 30 
min. and spotted onto Hybond-N+ membranes (Amersham 
Pharmacia Biotech, Little Chalfont, UK), using MicroGrid 

30 II workstation according to the manufacturer's 

instructions (BioRobotics Ltd, Cambridge England) . The 
immobilized cDNAs were fixed using a UV cross-linker 
(Hoefer Scientific Instruments, San Francisco, USA) . 

35 In addition to the 1435 cDNAs, the printed arrays also 
contained controls for assessing background level, 
consistency arid sensitivity of the assay. These were 
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spotted at multiple positions and included controls such 
as PCR mix (without any insert) ; positive and negative 
controls of SpotReportTM 10 array validation system 
(Stratagene, La Jolla, USA) and cDNAs corresponding to 
5 constitutively expressed genes such as b-actin, g-actin, 
GAPDH, HOD and cyclophilin. Also, oligonucleotides 
corresponding to SIX1, b-tubulin, TRP-2, MDM2, Myosin 
Light C, CD44, Maspin, Laminin, and SRP 19 were included 
to detect disseminated cancer cells. 

10 

The total RNA from bipod collected in EDTA tubes was 
purified using Trizol LS Reagent protocol 
(Invitrogen/Life Technologies) . From blood contained in 
PAXgene tubes, the total RNA was purified according to 

15 the supplier's instructions (PreAnalytiX, Horabrechtikon, 
Switzerland) . Contaminating DNA was removed from the 
isolated RNA by DNAase I treatment using DNA- free kit 
(Ambion, Inc. Austin, USA). RNA quality was determined 
visually by inspecting the integrity of 28S and 18S 

20 ribosomal bands following agarose gel electrophoresis. 
The concentration and purity of extracted RNA was 
determined by measuring the absorbance at 260 nm and 280 
nm. mRNA was isolated from the total RNA using Dynabeads 
as per the supplier's instructions (Dynal AS , Oslo, 

25 Norway) . 

Labelling and hybridization experiments were performed 
in batches. The number of samples assayed in each batch 
varied from six to nine. In the case of samples that 

30 were assayed more than once (replicates) , aliquots 
derived from the same mRNA pool were used for probe 
synthesis. For probe synthesis, aliquots of mRNA 
corresponding to 4-5 fig of total RNA were mixed together 
with oligodT 25NV (0.S /xg/ml) and mRNA spikes of 

35 SpotReport™ 10 array validation system (10 pg; Spike 2, 
1 pg) , heated to 70 °C to remove secondary structures, 
and then chilled on ice. Probes were prepared in 35/xl 
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reaction mixes by reverse transcription in the presence 
of 50/xCi [a 33 P] dATP, 3.5 /iM dATP, 0.6 mM each of dCTP, 
dTTP, dGTP, 200 units of Superscript reverse 
transcriptase (Invitrogen, Lif eTechnologies) and 0.1 M 
5 DTT, labelling for 1.5 hr at 42°C. Following synthesis, 
the enzyme was deactivated for 10 min. at 70 °C and mRNA 
removed by incubating the reaction mix for 20 min. at 
3 7°C in 4 units of Ribo H (Promega, Madison USA) . 
Unincorporated nucleotides were removed using ProbeQuant 
10 G 50 Columns (Amersham Biosciences, Piscataway, USA) . 

Prior to hybridization, the membranes were equilibrated 

in 4 x SSC for 2 hr at room temperature and 

prehybridized overnight at 65 °C in 10 ml 
15 prehybridisation solution (4 x SSC, 0.1 M NaH 2 P0 4 , 1 mM 

EDTA, 8% dextran sulphate, 10 x denhardt's solution, 1% 
• SDS) . Freshly prepared probes were added to 5 ml of the 

same prehybridisation solution, and hybridization 

continued overnight at 65 °C. The membranes were washed 
20 at 65° C at increasing stringency (2 x 30 min. each in 2 

X SSC, 0.1% SDS; 1 x SSC, 0.1% SDS; 0.1 X SSC, 0.1% SDS) 

to remove unspecific signals. 

The amount of labelled first strand cDNA binding to each 
25 spot was assessed and quantified using a Phospholmager 
to generate a gene expression data set. The data was 
generated using Phoretix software version 3 (Non Linear 
Dynamics, England) . Background subtraction was 
performed on the generated data by subtracting the 
30 median of the line of pixels around each spot outline 
from the total intensity obtained from the respective 
spots . 

The background- subtracted data was then normalized and 
35 transformed by selecting out 50 lowest and 50 maximum 
signals from each membrane. This step was to exclude 
geries that were expressed with a high degree of 
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variance. Since the genes varied from membrane to 
membrane, the expression data from 497 genes were 
removed from the data set. The values for the remaining 
93 8 genes were then normalised by using different 
approaches such as external controls, dividing each spot 
by the median intensity of the observed signal in the 
respective membrane, range normalizing the data from 
each membrane, and then log transforming the data 
obtained. 



The processed data obtained above was then used to 
isolate the informative probes by: 

a) keeping one unique sample (including all 
repetitions of the selected sample) out per cross 

15 validation segment; 

b) building a calibration model (cross validated) 
on the remaining samples using PLSR-DA; 

c) selecting the set of significant genes for the 
model in step b using the Jackknife criterion; 

20 d) repeating steps a) , b) and o) until all the 

unique samples were kept out once (hence, in all 75 
different calibration models were built (after repeating 
step b) 75 times) , resulting in 75 different sets of 
significant probes (after repeating step c) 75 times) ) ; 

25 e) selecting significant variables using the 

frequency of occurrence criterion amongst the 75 
different sets of significant probes. 

The selected informative probes based on occurrence 
30 criterion were used to construct a classification model. 
The result of the classification model based on probes 
appearing in at least 90% of the generated sets after 
the step of isolating informative probes as described 
above is shown in Figure 2 in which it is seen that the 
35 expression pattern of these genes was able to classify 
most women with breast cancer and women with no breast 
cancer into distinct groups. .In this figure PCI and PC2 
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indicate the two principal components statistically 
derived from the data which best define the systemic 
variability present in the data. This allows each 
sample, and the data from each of the informative probes 
5 to which the sample's labelled first strand cDNA was 

bound, to be represented on the classification model as 
a single point which is a projection of the sample onto 
the principal components - the score plot. 

10 The ability of the generated model , based on isolated 
informative probes, to predict future samples was 
determined by the double cross-validation approach. The 
performance of the diagnostic test for breast cancer 
based on the occurrence criterion is presented in Table 

15 6. 

Correct prediction of most breast cancer cells was 
achieved. These included all three samples obtained 
from women with ductal carcinoma in situ (DCIS) , 11/15 

20 samples obtained from women with stage I breast cancer, 
all five samples obtained from women with stage II 
breast cancer, and one of two samples obtained from 
women with stage III breast cancer. Interestingly, two 
correctly predicted stage I samples were obtained from 

25 women having a tumour size of <5 mm in diameter. 

The model also correctly predicted the class of most 
non-cancer samples (41/46) , including those that were 
obtained from women with non- cancerous breast 
30 abnormalities. 

Confirmation that the gene transcripts are not from 
cells which are disseminated disease cells has been 
confirmed by several lines of evidences. Firstly, the 
35 informative genes were expressed const i tut ively at high 
or moderate levels in blood cells of women irrespective 
of whether they had cancer or not. . Secondly, in the 
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assay described in this Example, in order to identify 
transcripts, at least 720 disseminated cells in blood 
samples would be required. Since, the average number of 
disseminated cells present in blood during different 
5 stages of breast cancer is much lower (organ confined 

breast cancer, 0.8 cells per ml; invasive breast cancer 
spread to lymph nodes only, 2.4 cells per ml; and 
metastatic breast cancer, 6 cells per ml; SD>100%) (29) , 
we believe that the signals being detected originated 

10 from peripheral blood cells and could not have 

originated from disseminated cells. Thirdly, we were 
not able to detect any signal from the eight cancer 
markers known to have elevated expression in malignant 
cancer cells, including cancer cells that are 

15 disseminated in the blood. 

sam ple 2: Diagnosis of Alzheimer's digeagQ 

Similar experiments were conducted with samples from 
20 Alzheimer's patients. In this method 7 patients 

diagnosed with Alzheimer's Disease at the Memory Clinic 
at Ulleval University Hospital were used in the trial. 
The patients were confirmed as having Alzheimer's 
disease based on the following criteria: 
25 * A standardized interview with a care -giver using 
IQCODE, an ADL scale and a scale measuring 
behaviour of the patient (Green scale) . 

* Neuropsychological evaluation using MMSE, Clock 
drawing test, Trailmaking test A and B (TMT A and 

30 b) , Kendrick object learning test (visual memory 

test) , part of the Wechsler battery and Benton 
test . 

* A psychiatric evaluation using scales for detection 
of depression, MADRS for interviewing the patient 

35 and Cornell scale for interviewing the care-giver. 

* A physical examination. 

* Laboratory tests of blood samples to. rule out other 
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diseases. 

* CT scan of the brain, 

* SPECT of the brain. 



5 The mean age of the patients was 72.3 with an age range 
of 69-76. The mean MMSE score was 22.0 (the maximum 
score attainable being 30) . 



Six age-matched individuals without diagnosed 
10 Alzheimer's disease were used as a control. All had * 
been tested with MMSE and had a minimum score of 28 
(mean: 28.4). The mean age of the normal control group 
was 73.0 and the age range 66-81. A sample from a 16- 
year old individual, with a consequent minimal chance of 
15 having Alzheimer's disease, was also included as an 
additional control. 

Using the methods described above (except that 
hybridization to 758 rather than 1435 cDNA clones was 

20 performed) , informative probes were selected based on 
occurrence criterion and used to construct a 
classification model. The results of the classification 
model based on probes appearing at least once in the 
generated sets after the method to isolate informative 

25 probes as described above is shown in Figure 3 in which 
it will be seen that the expression pattern of these 
genes was able to classify individuals with or without 
Alzheimer's disease into distinct groups. In this 
Figure PCI and PC2 indicate the 2 principal components 

30 statistically derived from the data which define the 

systematic variability present in the data. This allows 
each sample, and the data from each of the informative 
probes to which the samples' cDNA was bound, to be 
represented on the classification model as a single 

35 point which is a projection of the sample onto the 
principal components - the score plot. 
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The ability of the generated model, based on isolated 
informative probes, to predict future samples was 
determined by the double cross-validation. The 
performance of the diagnostic test for Alzheimer's 
5 disease is presented in Table 7 . 
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Appendix A 

Partial Least Squares regression (PLSR) 
5 Let a multivariate regression model be defined as: 

Y = XB + F 
where 

10 X a NxP matrix with N predictor variables (genes) ; 

Y {NxJ) being the J" predicted variables. In our case Y 
represents a matrix containing dummy variables; 

B is a matrix of regression coefficients; and 
F is a NxJ matrix of residuals. 

15 

The structure of the PLSR model can be written as: 

X = TP T + E A/ and 

Y = TQ T + F A , where 

20 

where 

T (NxA) is a matrix of score vectors which are linear 
combinations of the x-variables; 

P (PxA) is a matrix with the x- loading vectors p a as 
25 columns; 

Q (JxA) is a matrix with the y- loading vectors q a as 

columns ; 

E a {NxP) is the matrix for X after A factors; and 
F a (NxCT) is the matrix for Y after A factors. 

30 

The criterion in PLSR is to maximize the explained 
covariance of [X,Y] . This is achieved by the loading 
weights vector w a+1 , which is the first eigenvector of 
E a T F a F a T E a (E a and F a are the deflated X and Y after a 
35 factors or PLS components) . 

The regression coefficients are given by: 
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B = W(P T W) _l Q T 

A PLSR model with full rank, i.e. maximum number of 
components, is equivalent to the MLR solutions. Further 
details on PLSR can be found in Marteus & Naes, 1989, 
Multivariate Calibration, John Wiley & Sons, Inc., USA 
and Kowalski & Seasholtz, 1991, supra. 
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pvam ple 3= Validation of Example X, diagnosis o£ breast 
cancer 

The results in Example 1 were validated by using the 
informative probes identified in Example 1 on new beast 
cancer and control samples. 

Methods 

The methods, essentially as described in Example 1, were 
used. Blood was taken from patients as described in 
Table 8. However, blood was collected in PAXgene tubes 
and the first strand labelled cDNAs were hybridized to 
719 cDNAs spotted on nylon membranes along with other 
controls as described in Example 1. After background 
subtraction using control spots, the data of each 
membrane was normalized using the inter quantile range. 
The data was analysed as described in Example 1 and the 
model validated by cross validation. 

The 719 cDNAs which were spotted are a subset of the 
cDNAs spotted in Example 1 and include 111 cDNAs 
described in Table 2 and which were found to be 
informative in Example 1. 

Results 

The results are shown in Figures 4 to 9. Figures 4, 6 
and 8 are projection plots similar to Figure 2 and show 
the projection of normal and breast cancer patients' 
samples onto a classification model generated using all ' 
719 cDNA. Figure 6 is similar but uses a classification 
model generated with the 111 probes common to Example 1. 
Figure 8 uses the 345 sequences of the 719 for which 
sequence information is provided herein. In each case 
classification of normal and breast cancer groups was 
possible. Figures 5, 7 and 9 show prediction plots 
which reflect the ability of the generated models to 
correctly diagnose breast cancer. In the 3 prediction 
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plots shown, the disease samples appear on the x axis at 
+1 and the non-disease samples appear at -1. The y axis 
represents the predicted class membership. During 
prediction, if the prediction is cprrect, disease 
5 samples should fall above zero and non-disease samples 

should fall below zero. In each case almost all samples 
are correctly predicted. 

Sam ple 4: Validation of Example 2. diagnosis of 
10 Alzfreimeys 

The results in Example 2 were validated by using the 
informative probes identified in Example 2 on new 
Alzheimer's patient samples. 

15 

Methods 

The methods, essentially as described in Example 2, were 
used. Twelve female patients diagnosed with Alzheimer's 
disease at the Memory Clinic at Ulleval University 
20 Hospital who were confirmed as having Alzheimer's 

disease based on the criteria of Example 2 were used in 
• the trial. The mean age of the patients was 72.3 with 
an age range of 66-83. The mean MMSE score was 22.0 
(the maximum score attainable being 30) . 

25 

Sixteen age -matched female individuals without diagnosed 
Alzheimer's disease were used as the normal control 
group. All had been tested with MMSE and had a minimum 
score of 29. The mean age of the normal control group 
30 was 74.0 and the age range 66-86. 

After transfer of the blood to PAXgene tubes, total mRNA 
was isolated from the blood of the Alzheimer's disease 
and from the control group donors according to the 
35 manufacturers 1 s instructions (PreAnalytiX, 

Hombrechtikon, Switzerland) . The isolated mRNA was 
labelled during reverse transcription in the presence of 
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a 33 P-dATP, yielding a labelled first strand cDNA. 
Hybridization was performed as described previously onto 
73 0 cDNA clones picked from a cDNA library from whole 
blood of 550 healthy individuals without knowledge of 
5 the gene sequence of the random cDNA clones. 

Results 

The results are shown in Figures 10 and 11. Figure 10 
is a projection plot generated using 520 probes which 
.0 have been sequenced. Figure 11 is a prediction plot and 
shows correct prediction of almost all samples. 
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Table 1a 

List of probes informative for disease diagnosis 





Clone ID 


ID 


Mo off 

nucleotides 


1 [ 


1-01 


- 


- 


2 


1-02 


! 




3 


1-13 


- 


* 


4 


1-21 


- 


- 


5 


I-24 


308 


373 


6 


t-28 


310 


564 


7 


I-30 


1180 


622 


6 


I-34 


313 


554 


9 


H37 


m 




10 


t-42 


w 


m 


11 


I-52 




m 


12 


1-64 


1181 


155 


13 


1-58 


326 


554 


14 


1-71 


— 


i 


15 


1-72 


m 


■■» 


16 


1-86 




«• 


17 


1-95 






18 


11*03 


361 


coo 


19 


11-05 


363 


CIO 


20 


11-06 


364 


528 


21 


11-10 


368 


Ann 


22 


11-24 


381 


534 


23 


11-25 




AAA 
OHM 


24 


It oc 


ooo 


OOO 


ZD 








26 


11-34 


391 


566 


27 


11-41 


397 


534 


26 


II-42 


398 


512 


29 


M-47 




• 


30 


II-57 


411 


505 


31 


11-61 


415 


596 


32 


11-69 


423 


387 


33 


11-70 


424 


420 


34 


11-75 


429 


535 


35 


11-83 






36 


11-84 


438 


577 


37 


11-87 


441 


552 


36 


11-88 


442 


606 


39 


11-90 






40 


11-94 


448 




41 


111-02 


453 


747 


42 


111-05 






43 


111-06 


458 


682 
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44 


111-08 


460 


536 


45 


111-10 


- 


- 


46 


111-13 


464 


615 


47 


IIM5 


- 


- 


48 


111-17 


- 


- 


49 


lli-20 


1183 


479 


50 


IU-23 


473 


694 


51 


III-26 


476 


476 


52 


III-35 


485 


551 


53 


llt-39 


487 


224 


54 


IM-40 


488 


349 1 


55 


IMS 


490 


382 


56 


IM4 


491 


382 


57 


IH-53 


500 


390 


56 


UI-56 


503 


109 


59 


HI-57 


504 


374 


60 


III-60 


- 


- 


61 


III-60 


- 


- 


62 


III-61 


507 


521 


63 


111-63 


509 


575 


64 


111-68 


« 


- 


65 


111-74 


518 


502 


66 


111-80 


523 


585 


67 


111-82 


- 


- 


68 


111-85 


526 I 


516 


69 


111-89 


530 


660 


70 


111-92 




m 


71 


111-96 


- 




72 


1V-14 


684 


545 


73 


IV-15 


1185 


628 


74 


IV-23 


- 


• 


76 


IV-26 


1186 f 


494 


75 


IV-26 


- 


- 


77 


IV-29 




- 


78 


IV-31 


687 


268 


79 


IV-32 


688 


569 


80 


IV-34 


- 


- 


81 


IV-35 


m 


- 


82 


IV-41 


I 


- 


83 


IV-45 


- 


- 


84 


IV-53 


61 


362 


85 


IV-62 


- 


- 


86 


IV-69 


192 


286 


87 


IV-80 


701 


579 


88 


IV-82 


- 


- 


89 


IV-93 


- 


- 


90 


IX-10 


736 


641 


91 


IX-12 


• 




92 


IX-38 


757 


ran 

583 


93 


IX-39 


758 


424 


. 94 


IX-42 






95 


IX-48 


764 


626 


96 


IX-77 


785 


556 


97 


V-01 




*• 


98 


V-02 






99 


V-03 


706 


496 
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100 


V-U4 


iVr 


wi*7 


101 


\l AC 






102 


\/J\7 

V-t// 


/Uo 




lOo 


v- 1 1 1 


1 100 


coo 

oyy 


104 


V/4 0 


f 11 




lOo 








1UO 


V/-17 
V"* I / 








V % 














"too 


V 






11U 








in 








lie 








1 IO 


V~*fo 






114 


V~<tr 






lib 








116 


\/ CO 






-a -4—9 

117 


V-&4 






11B 




-y~y 
/ / 




119 


\/ CO 






120 


V-59 






121 


V-65 






122 


V-68 


: 




123 


V-71 






124 


V-75 






125 


V-79 






126 


V-ov 


72o 


OCA 


127 


V»90 






12o 


v-y 1 








v*yfi 






I wU 


\fjQA 




* 


IO 1 


v 






1^.0 


Vl-TLl 


www 


122 


iqq 
■ ww 


X/l-07 
VI wr 


Q<t 

WW 












iQC 








I Ov» 




869 


667 

WW* 


147 
ly/ 


VU14 


B71 

w* ■ 


642 










10a 


V 1 fcW 


876 

U r W 


115 


1 AT) 


VI-21 

VI Cm I 






iff 1 


VI-23 
v 1 to 


B7B 

w r w 


634 


IAS? 








1W 


V 1 *T 1 








V 1 *Tfc. 






l*rw 










VI-44 






id7 


V/l-48 


891 


626 


14a 


Vt-49 






149 


VI-50 


693 


565 


150 


Vl-52 




• 


151 


Vl-53 


895 


560 


152 


VI-5S 


897 


509 


153 


VI-65 


» 




164 


VI-70 


108 


550 


155 


Vt-71 
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156 


Vl-72 


• 




157 


VI-74 


905 1 


655 


158 


VI-76 


907 


582 


159 


VK76 


— 




160 


Vl-79 


— 




161 


VI-84 






162 


VI-87 


911 


595 


163 


VI-88 


912 


651 


164 


VI-90 


I 




165 


Vl-93 






166 


% it nr- 


915 


2o0 


167 


VI-96 






168 


VH-02 






169 ! 


Vll-03 


1196 


412 


170 


VIK>6 






171 


VII-10 






172 


V1M1 


— 




173 


VII-15 


1199 


439 


174 


VIM 9 


562 


580 


176 


VII-21 


564 


671 


176 


VII-25 


- 


— 


177 


VH-32 


571 


457 


178 


Vll-36 


675 


209 


179 


Vll-39 


576 


541 


180 


VII-42 


579 


502 


181 


VJf-43 


580 


316 


182 


VI M6 


583 


631 


183 


VII-47 


1200 


526 


184 


VII-48 


1201 


613 


186 


VH-59 


593 


565 


186 


Vil-60 


m 




187 


VII-63 


595 




188 


VI I -66 


598 




169 


%/ti err 

VI I -67 






190 


% it 1 "r^% 

VI 1-72 


600 


CQC 

■ O 90 


191 


\ tii 

VI 1-73 


601 




192 


% /if >7C 






193 


Vlh76 


OU3 • 




194 


| VIK77 


12U3 




195 


% tit nn 

VII-80 


DUD 


ooo 


196 


VM-81 


606 


ODD 


197 


VH-83 






198 


VII-66 


«• 




199 


VlI-88 






200 


VH-90 


Dl<i 


Of D 


201 




C1Q 
OlO 


«3*r J 


202 


\ttt 

VI I -93 






203 


will r\A 








Vlll-02 






205 


Vltt-03 






206 


VIII-06 






207 


VIII-09 


618 


598 


208 


VIII-10 






209 


VIIM5 






210 


Vlll-20 


628 


419 


211 


VI 11-22 


m 
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212 


VHI-26 




** 


213 


Vlll-28 


634 


511 


214 


VIII-29 


635 


592 


215 


Will C\/\ 


636 


572 


216 


« /1 1 1 **» A 


637 


482 


217 


1/1 II 41 

VIII-32 


638 


545 I 


218 


VMKJ3 


con i 

boy 


624 


219 


VIU-39 






220 


Will 4 


o45 


649 


221 


\ /til >io 




bUU 


222 


Will ji 






223 


VIII-46 


©49 


425 


224 


»/iii ^ftt 


651 


251 


225 


Vlll-58 






226 


V1IM54 


663 


627 


227 


Vlll-65 


*™ 




226 


VIH-66 


665 


345 


229 


VHI-67 


666 


252 


. 230 


Vltl-74 






231 


VIII-76 


675 


591 . 


232 


Vlll-78 


• 


*■ 


233 


VHI-82 


** 


M 


234 


VIII-83 


*~ 




235 


Vlll-85 


- 




236 


Vlil-87 


m 




237 


VIII-91 


*" 


** 


236 


VIN-92 






239 


Vlll-93 


Pi 


: 


240 


VIII-95 




■ .7 


241 


X-04 






242 


X-07 


808 


Wr\ 


243 


X-15 


Di /I 


iqo 


244 






^*7fl 

o/u 


245 


A-34 






246 








247 


X-54 


o3/ 


ATI'S 


248 


A-OO 


DOS 


"7*1 


249 


X-68 


-4 orvr 




250 


X-7Z 






251 


A-94 


obO 


OU 1 


252 


XK>7 






253 






OcXf 


254 


XI-50 






255 


VI CO 






256 


VI 0<4 

XI-81 


i bio 
idle. 


Of *r 


257 


VII /VT 

XI 1-07 






258 * 


\ VII 1*7 

XIM7 






259 


VII 






260 


Xlt-27 






261 


XU-31 






262 


Xli-32 






263 


XII-35 


1214 


620 


264 


XIK36 






265 


XII-52 






266 


XII-59 . 


1216 


484 


267 


XIIM9 


1219 


SS9 
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268 


XI 11-29 






269 


XI 11-52 


939 


513 


270 


XI 1 1-62 




~Z 


271 


XIII-84 






272 


Xlll-92 




/Hi 


273 


VI / -a 0 

XV-lo 






274 


XV-22 


1099 


obi 


275 


Xv-24 






276 


XV-25 






277 


XV-Zo 






278 | 


XV-34 






279 


AV-42 






280 


XV-68 






281 


XV-74 






. 282 


XV-93 






283 


XV-94 






284 


XV-96 






285 


XY1-36 


1056 


435 


286 


XV1-53 


1230 


741 


207 


XVI-59 


™ 


■ 


288 


XVl-66 


1074 


689 


289 


XVI-76 


1083 


198 


290 


XVI-77 


1084 


198 


291 


XVII-07 






292 


XVH-08 


: 




293 


XVIM7 






294 


XVH-28 


1 




295 


XVI 1-29 






296 


XVll-31 


1 ldSI 




297 


XVH-36 






298 


XVI 1-39 






299 


XVI 1-40 


1*5 •Si 




300 


VI rl 1 in 


1 l*K> 


00/ 


301 


XVII-55 






302 


XVII-58 




» 


303 


XVll-67 






304 


XV1I-72 






305 


XVI 1-76 


1160 


650 


306 


XVH-62 






307 


XVII-87 


1165 


502 


308 


XVIl-95 


1172 


648 1 
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Table 1 b 

List of sequences of probes informative for disease diagnosis 



Please see the note at the bottom 



Clone ID S 


iequence ID 


1-09 


298 


1-10 


299 


1-13 


1331 


1-14 


1178 


1-15 


300 


1-16 


301 


1-17 


302 


1-19 


304 


1-20 


305 


1-22 


306 


1-23 


307 


1-24 


308 


1-25 


309 


1-28 


310 


1-30 


1180 


1-31 


311 


I-32 


312 


I-34 


313 


I-37 


1440 


I-38 


314 


I-39 


315 


I-40 


316 


I-42 


1332 


I-44 


317 


I-45 


318 


1-46 


319 


I-47 


320 


I-48 


321 


I-49 


322 


I-53 


X 323 


I-54 


1181 


I-56 


324 


I-57 


325 


I-58 


326 


I-60 


327 


I-64 


328 


I-67 


330 


I-69 


331 


1-71 


332 


i-72 


1 333 


I-73 


334 


I-77 


335 


I-79 


| 336 


I-80 


337 
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-81 
H-82 



I-86 



I-88 



I-95 



II-02 



II-03 



II-05 



II-06 



II-07 



II-08 



II-09 



11-10 



11-11 



11-12 



11-13 



11-14 



11-15 



11-16 



11-17 



11-18 



I1-20 



11-21 



II-22 



H-23 



II-24 



II-25 



II-26 



I1-27 



II-28 



II-29 



U-30 



11-31 



II-32 



H-33 



II-34 



1I-35 



II-37 



II-38 



U-39 



II-40 



11-41 



II-42 



II-43 



II-44 



II-46 



1I-47 



II-48 



II-49 



U-50 



Hl-52 



79 



338 



339 



1336 



1182 



1337 



360 



361 



363 



364 



365 



366 



367 



368 



369 



370 



371 



372 



373 



374 



375 



376 



377 



378 



379 



380 



381 



382 



383 
384 



385 



386 



387 



388 



389 



390 



391 



392 



393 



394 



395 



396 
397 



398 



399 



400 



401 



402 



403 



404 



405 



406 
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«R3 



11-54 



11-55 



11-56 



11-57 



11-58 



11-59 



11-60 



11-61 



II-62 



II-63 



II-64 



II-65 



II-66 



11-67 



11-68 



11-69 



11-70 



11-71 



II-72 



II-73 



11-74 



II-75 



II-76 



II-77 



II-78 



II-79 



II-80 



H-81 



II-82 



U-83 



II-84 



II-85 



II-86 



II-87 



II-88 



II-89 



II-90 



11-91 



II-92 



II-93 



II-94 



II-95 



II-96 



111-01 



II1-02 



III-03 



III-04 



III-05 



III-06 



III-07 



80 



407 



408 



409 



410 



411 



412 



413 



414 



415 



416 



417 



418 



419 



420 



421 



422 



423 



424 



425 



426 



427 



428 



429 



430 



431 



432 



433 



434 
435 



436 



437 
438 



439 



440 



441 



442 



443 



444 



445 



446 



447 



448 



449 



450 



452 



453 



454 



455 



457 



458 



459 
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111-08 


460 


111-09 


461 


111-11 


462 


[IH-12 


463 


111-13 


464 


111-14 


465 


111-15 

Ml 1 w 8 


466 


I1IM6 

llll lv 


467 


111-17 
jut— i # 


468 


111-18 


469 


111-19 

|lll— lw 


470 


j||j.20 


1183 


1111-21 

III! Mm. 1 


471 


II 1-22 

[III CC- 


472 


111-23 


473 


• 111-24 

llll mmT 


474 


111-25 

(III £.*J 


475 


(111 mm\J 


476 


IIU27 


477 


|l II Am*** 


478 


1111-29 

[III £m9 


479 


111-31 

fill w 1 


481 


111-32 

nil 


482 


111-33 


483 


111-34 
■in 


484 


111-35 

Jill 


485 


1111-37 

|1 HO 1 


486 


111-39 

II II W« 


487 


111-40 

[III tw 


488 


IH-42 

|| II^T*. 


489 


ni-43 

[II 1 


490 




491 


111-45 


492 


111-46 


493 


111-47 


494 


111-48 


495 


111-49 


496 


111-50 


497 


111-51 


498 




III-52 


499 




III-53 


500 




IU-54 


501 




III-55 


502 




III-56 


503 




III-57 


504 




III-58 


505 




lil-59 


506 




111-61 


507 




IH-62 


508 




III-63 


509 




llt-64 


51 
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IH-65 
UI-66 
111-67 
111-69 
111-70 
111-71 
jtl l-73 
&ll-74_ 

HlT-77 
tliT-78 

Im-so 

111-81 



III-82 



lil-83 



III-85 



III-86 



II-87 



III-88 



Hl-89 



111-91 



lli-92 



HII-93 



Hl-94 



III-95 



IH-96 



lV-02 



IV-04 



1V-13 



IV-14 



1V-15 



IV-17 



lV-23 



IV-26 



IV-28 



IV-31 



IV-32 



IV-35 



IV-37 



IV-38 



lV-40 



IV-42 



IV-43 



lV-44 



82 



511 1 

51; 

51 3| 
514 
515 
516 
"517^ 
518 
519 
520 
521 



523 
~52M 



134E 
525I 



526 



527 



528 



529 



530 



531 
1351 



532 



533 



534 



535 



681 



682 



683 



684 



1185 



685 



1353 



118C 



686 



687 



688 



1355 



|g6 



689 



690 



691 



123S 



692 



lV-47 


693| 


IV-53 


61 


IV-55 




lV-56 


695 


IV-61 


f!~ 696 


IV-64 


I 697 
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IIV-65 


698 


IV-69 


192 


IV-72 


699 


IV-73 


700 


IV-80 


701 


IV-82 


196 


IV-85 

1 1 V 


702 


IV-93 

II W ww* 


703 


IV-95 

II W WW' 


704 


IV-96 

II V~wW 


705 




736 


IX-12 


738 




739 


IX-24 


747 


IX-38 


757 


IX-39 


758 


IX-48 


764 


IX-^O 


766 




768 


IX-62 


773 


IX-65 


776 


IX-72 


782 


IX-77 


785 


IX-91 


796 


IX-96 


801 


\ 


/-01 


1361 




/-03 


706 




/-04 


707 




✓-07 


708 




✓-08 


709 


> 


✓-09 


710 




✓-11 


1 1188 


I 


✓1-16 


873 




V1-19 


875 




V-12 


711 




V-17 


T 1364 




V-18 


712 




V-20 


713 




V-24 


' T— 714 




V-25 


i 1365 




V-28 


1189 




k/-35 


1366 




K/-37 


T 716 




K/-38 


1190 




K/-39 


1109 




K/-40 


717 




K/-41 


718 




K/-47 


1368 




K/-48 


719 




V-49 


1369 




|v-55 


I 77 
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V-57 


*7r>n 


V-58 


4 Q*7f* 

lo/U 


V-61 


721 


V-64 


f zz 


V-65 


"TOO 

723 


V-68 


A A A O 

1448 


V-71 


1495 


V-74 


724 


V-75 


A 0 70 

1372 


V-80 


726 


V-81 


727 


V-87 


728 


V-90 


1374 


VI-02 


o vi n 

340 


Vl-03 


O J 4 

341 


VI-04 


342 


VI-06 


343 


VI-07 


344 


VI-08 


345 


Vl-09 


346 


VI-11 


347 


Vl-12 


869 


VI-13 


870 


VI-14 


871 


VI-16 


oto 

873 


VM8 


n in 

348 


VI-19 


349 


VI-20 


350 


VI-21 


351 


VI-22 


352 


VI-23 


878 


VI-24 


879 


VI-25 


O CO 

353 


VI-26 


ATI 

354 


VI-27 


o c c 

355 


VI-31 


356 


VI-32 


DOC 

885 


VI-33 


o c^ 

357 


VI-35 


O CO 

358 


VI-39 


e% o ~7 

887 


VI-43 


" ! ~~~ «4 O O O 

1382 


VI-44 


1193 


VI-45 


889 


VI-48 


359 


Vl-49 


892 


% it en 

VI-50 




VI-53 


895 


VI-55 


897 


VI-58 


899 


VI-66 


903 


VI-67 


904 
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\/U7Q 
v i I w 


108) 


VI-71 

VIII 


1387^ 


VI-74 


905 


VI-75 

V 1 i w 


906 


VL76 

vi i u 


907 


VI-77 

V l~ # * 


110 


V1-79 


1389 


VL80 

V 1 WW 


908 


VI-85 

V 1 WW 


910 


VI-87 

V 1 w # 


911 


Vl-ftft s 
vr-oo 


912 


vi-qo 

V l~c/w | 


1390 




1391 


VI-Q5 


9151 


\/l Oft 


1392| 


V1I.09 


547^ 


Vll~ww 


548 


\/l 1-04 


549 


VI 1-05 

V 1I*"WW 


550 


Vll-06 

V I ruu 


551 


V 1 1 w / 


552 


VII-OR 
Vll~wO 


553 


VII-OQ 


554 


VI 1-10 


555 


VI 1-1 1 


556 


VI 1-1? 


557 


Vll-1 A 

V 1 1~ 1 H 


558 


Vll-1 5 

V 1 1 1 w 


559 


Vll-1 7 

V 1 1*" 1 • 


560 


VII io 


561 


Vll-1 Q 


562 


VI 1-90 


563 


VIU21 

V 1 1 A. 1 


564 


Vll-99 


565 


VI 1-9** 

V W-/LG 


566 


VI 1-94 




VI 1-9 5 






250 


VI 1-97 


568 


V 1 l-^w 


f 569 


VI 1-9 Q 


570 




571 


**** 


1 572 


VII *\A 


573 


VII-35 


574 


VII-36 


575 


VII-39 


576 


VII-40 


577 


VII-41 


578 


VII-42 


579 


VII-43 


580 
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Vll-44 


581 


VH-45 


* 582 


VI 1-46 


583 


VII-47 


1200 


Vll-48 


584 


VI 1-49 


585 


VI 1-50 


586 


VII-52 


587 


VII-53 


588 


VII-54 


589 


VI 1-55 


590 


VII-57 


591 


VII-58 


592 


IVII-59 


593 


[VI 1-62 


594 


|vil-63 


595 


jVll-64 


596 


|VII-65 


597 


(Vll-66 


598 


|vil-67 


1399 


I VI 1-71 


599 


|VII-72 


600 


lvil-73 


601 


VII-74 


602 


VII-76 


603 


Jvil-77 


604 


|vil-80 


605 


IVII-81 


606 


Vll-82 


607 


|vil-83 


608 


1VII-84 


609 


VII-86 


1453 


VII-87 


r 610 


VII-89 


611 


k/ll-90 


612 


VI 1-91 


613 


VII-92 


614 


|VII-93 


615 


VII-94 


I 616 


|vil-96 


^~ 617 


|vill-09 


618 


lVlll-1 0 


619 


lVIII-1 1 


620 


lVIII-12 


621 


VIII-13 


622 


VIII-15 


623 


VI 11-1 6 


624 


VIII-17 


625 


VIII-18 


626 


VIII-19 


^ 627 


|Vlllr20 


628 
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Will 01 1 


629| 


Will OO 


145S| 


Vlll-2o 


63CM 


\/tll o>i 

VI 11-24 


631 


will OC 


632 


\ /III Oft 
VIII-20 


1456 


Will 0*7 

VI 11-27 


633 


Will 00 

vlll-20 


6341 




635| 


VIII-OU 


636] 


win 0*1 
VIII-31 


637 


Will 00 


638 


VIII-oo 


639 


(V/lll *1A 

|VIII-o4 


6401 


K/111 oc 


641 


1 V /111 0."7 

VIU-37 


642 


k/111 op 


643] 


I\/iii An 
|VIII-4u 


644 


!\/lll A A 


645| 


K/111 /o 
|Vlll-42 


646 


Vlll-4«3 


647] 


k/lll AK. 

|VIII-4D 


648 


[Will Ad 


6491 


K/lll A—f 

M 11-47 




K/lll A Q 

[VIM-40 


6511 


I\/iii cn 
jVlII-OO 


652 


lv /tit C«4 

IV 11 1-51 


653 

www 


K/lll CO 

VIII-53 


654 


K/111 c>i 
IVI 11-54 


655 


Wilt cc 


656 


k/lll EC 


657 


K/lll C7 

VI ll-O/ 


658 


k/m ca 


659| 


K/111 cQ 


660) 


K/lll CtCi 

VIII-oU 


661 


K/111 e a 
VI 11-61 


662 


K/111 a A 
VHI-D4 


663| 


K/lll cc 

[Vlll-bO 


664 


K/111 act 
VIII-OO 


665] 


k/lll CTT 

IVill-o/ 


666 


Vm-00 


667| 


K/m co 

vui-oy 


668 


K/m "70 

VI 11-70 


669 


K/lll *7 A 


R70 


K/lll T O 

VI 11-72 


671 


k/IIU73 

1 VIII # 0 


^ 672 


vin-74 


673 


Mii-75 


674 


VIII-76 


675 


MII-77 


676 


Vlll-78 


677 
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Will 7Q 
VIII- 1 « 


678 


vlll-OU 


679 


A-U/ 


808 


Y i K 
A- 1 O 


814 


v on 


817 


Y OQ 


821 


*y ia 


825 




833 


Y **A 


837 


Y Cfi 

A*%K> 


839 


A-DO 


1207 


A-# j£ 


849 


A-/0 


1208 


A-»4 


860 


Ah \£ 


1209 


Yl 17 


1460 


Yl A1 

AI-4o 


1210 


VI ft7 


1211 


Yl Ri 


1212 


Yll IY7 


1213 


VII m 


1214 


II ift 


1215 


Ytl ftQ 


1216 


Yll ftA 


1028 


VII QO 


1217 


viii ni 


~ 917 


Vlll OA 
AIII-U4 


1218 


Yin -1 q 
AW- 1 » 


1219 


Ylll OA 
All 1-^4 


926 


Ylll A4 
Aili-O 1 


938 


Vlll CO 


939 


Ylll-fi7 


T 947 


Ylll ftO 

Aiii-oy 


949 


Ylll ftft 
AIII-OO 


1220 


Vlll QO 
AI1I-9Z 


1221 


Y\/ OO 


1099 


V\/ OA 


1101 


Y\/ OA 


1224 


V\/ AO 

AV-4Z 


1108 


V\/ CO 


1226 


V\/ ftA 


1118 


V\/ fiA 


112*3 


V\/l «1Q 


1228 


V\/l 1ft 


1056 


XVI-53 


1230 


XVI-60 


1071 


XVl-66 


1074 


XVI-74 


1081 


XVI-76 


^ 1083 


XVI-77 


^ 1084 


XVI1-31 


1139 
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XVH-40 


1231 


XVII-48 


1148 


XVil-76 


\~ 1163 


XVII-87 


1165 


XVU-95 


1172 



89 



Note 



Sequences not available for sequence IDs in Table 1, and corresponding sequence Ids in 
Table 2 and 4. 



298,301,305,307,312,317,318,319,320,332 ,,333 '^^S^i^S^S^I^ 




0 522 528 531 535 547 548 549,550,551,552,553,554,555,556,557,558,559,573,584,604,608, 
616^2^ 

2 909916 1101 1108,1109,1177,1187,1193,1204,1220,1239,1255,1256,1342,1347,1354, 
1357,1362,1363,1364,1373,1375,1379,1403,1404,1405,1406,1413 
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Table 2a 



List of informative probes for 







1-24 








1-30 


1 1 fin 


1-52 




1 1-54 


llOl 


11-41 


337 ! 


11-70 


424 


11-87 


AAA 

441 


111-06 ! 


458 


111-20 


1183 


llt-40 


488 


IU-57 


504 


111-60 


- 


111-61 


507 


HI-89 


530 


IV-14 


684 


IV-15 


1185 i 


IV-26 


11 TO 


iv-32 


oao 


lv-41 




IV-53 




IVH54 




IV/CQ 

iv-oy 




iV-oU 


7rt"l 
/ w I 


i v-o^ 


I9w 


IA- 1U 


73S 








757 


IA"05J 


758 


IA"*WE 




IA"*rw 




i/V f f 


I 785 




1188 






V-39 




V-55 


77 


V-80 


726 


V-94 




Vl-07 


93 


VI-34 




Vt-41 




VW8 


891 


Vl-49 




VI-52 




VI-6S 


897 


VI-65 




VI-70 


108 



diagnosis of breast cancer 



Clftne ID 


Seouencfi ID 


V1-.79 




\/i-7n 




Vi-0*r 




Vlr-Uw 


1 1 BO 


VII* ID 


1 1 


\/||.QO 
Vll-w£ 


Or I 


Vlr«i» 


cr/fz 
DrO 


Vll-47 




1 Vlh4o 




VIM50 




VII-73 


DU 1 ■ 


Wtf 1 y jt 

Vll-77 


4 nrtn 

1203 


VI 1-90 


ol2 


VUI-20 


o2o 


\/IH rif\ 


boo 


VI1I-30 


OwO 


vuksi 


Dor 


V/IILOO 
VIIrNRJ 




\/ill_>4/4 
VIH**H» 




VIII-**D 


w**w 


\t\\\-ASK 


WW 1 


V llrvw 


665 

Www % 


Vlll-74 

VIII"/*T 




VIII.76 
viii-#o I 


675 

w « w 








808 


Ai 1 9 


814 

w it 


A>"Aw 


821 


^rw*r 




X-35 




X-54 

/Vw*r 


837 


X-56 


639 


X-68 


1207 


X-72 


849 


X-94 


860 


XI-07 




Xl-13 


1209 


XI-50 




XI-58 




XI-81 


1212 


XII-07 


1213 


XIH7 




XII-26 


m 


XII-27 




XII-31 




XH-32 




XII-35 


1214 
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Clone lu 


oequence iu 


XII-36 




XII-52 




XII-59 




XIIM9 




XN1-29 




VIII CO 

XJII-52 


OQO 


VIII 




XIII-84 




XI II -92 


4 004 
l<££ 1 


V/l / <4 ft 

I XV-18 




XV-22 


1099 


XV-24 




XV-25 


1224 


XV-28 


** 


XV-34 




XV-42 


"~ 


XV-68 




XV-74 


• 


XV-93 




XV-94 


— 


XV-96 


** 


XVI-36 


1056 


XVI-53 


1230 


XVI-59 


- 


XVI-66 


1074 


XVl-78 


1083 


XV 1-77 


•* f\UA 
1U04- 


XVII-07 




V\/ll no 

XVI rUo 




XVI l-i / 




vwii on 








XVH-ol 




XVIroo 




V\/ll oo 




XVII-40 


1231 


XVU-48 


1148 


XVH-55 




XVH-S8 




XVII-67 




XVII-72 




XVII-76 


1160 


XVII-82 




XVII-87 


1165 


XVII-95 


1172 
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Table 2\> 

List of sequences of probes informative for breast cancer 

Please see the note at the bottom of Table 1 . Some sequences are missing. 



plnnp ID J 


Seauence ID 


1-1 O 


1331 


1 A A 

-14 


1178 

l i « w 


1-24 


308 

W W 


1-25 


30Q 


I OD 

1-28 


310 


1-30 


1180 


1 0*7 

-37 L 


1440 


1 it «*» 

-42 


1335 


1 A Q 


321 


1-54 


1 181 

1 1 w 1 


1-60 


327 


1-72 


1335 


1-81 


338 


1 QO 

1-82 


339 

WWW 


1 DC 


1336 


| o o 

l-oo 


1 1 W4b 


1-95 


1337 


1 1 no 


360 


11-03 


361 

WW 1 


ll-Ob 


364 


ii n*7 
11-0/ 


365 


ii <i f\ 


368 

wwW 


If O A 

11-21 


378 


1 ■ oo 

11-23 


380 

www 


II 1 yf 

1-24 


381 

WW 1 


U-25 


382 i 

w w«-> 


11-27 


wO*T 


It o o 

1-33 


3Q0 

Www 


IU34 


391 


11-41 


397 


II-42 


398 


II-46 


401 


II-47 


1338 


II-48 


403 


II-52 


406 


II-57 


411 


II-58 


412 


II-59 


413 


II-60 


414 


11-61 


415 


II-62 


416 


II-64 


418 
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11-67 



11-69 



11-70 



11-74 



11-80 



11-82 



11-84 



11-87 



11-88 



11-96 



111-01 



III-02 



III-06 



Hl-08 



111-12 



111-13 



111-17 



111-18 



III-20 



111-21 



lli-23 



111-24 



111-25 



111-26 



111-27 



-28 



111-29 



111-32 



111-33 



111-35 



111-39 



111-40 



111-42 



111-45 



111-46 



111-47 



111-48 



111-56 



111-57 



111-58 



111-59 



111-61 



Hl-62 



III-63 



III-64 



HI-66 



III-67 



III-70 



III-74 



III-75 



III-78 



93 



421 



423 



424 



428 



434 



436 



438 



441 



442 



450 



452 



453 



458 



460 



463 



464 



1344 



469 



1183 



471 



473 



474 



475 



476 



477 



478 



479 



482 



483 



485 



487 



488 



489 



492 



493 



494 



495 



503 



504 



505 



506 



507 



508 



509 



510 



512 



513 



515 



518 



519 



521 
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111-80 


523 


111-81 


524 


HI-82 


1348 


m-85 


526 


III-86 


527 


III-88 


529 


hl-89 


530 


M-92 


1351 


IH-93 


532 


III-95 


534 


UI-96 


1352 


IV-04 


682 


IV-13 


683 


IV-14 j. 


684 


IV-15 


1185 


IV-17 


685 


IV-23 


1353 


IV-26 


1186 


IV-31 


687 


IV-32 


688 


IV-35 


1355 


IV-37 


ge 


IV-38 


689 


IV-42 


691 


1V-43 


1239 


IV-47 


693 


IV-53 


61 


IV-61 


696 


IV-64 


697 


IV-69 


192 


IV-72 


699 


1V-80 


701 


IV-82 


196 


IV-85 


702 


IV-93 


1360 


IV-96 


705 


IX-10 


736 


IX-12 


738 


IX-13 


739 


IX-24 


747 


IX-38 


757 j 


IX-39 


758 


IX-48 


764 


IX-50 


766 


IX-56 


T 768 


IX-62 


773 


IX-65 


776 


IX-72 


782 


IX-77 


785 


1X-91 


796 


IX-96 


801 
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[V-01 I 


1361 


V-03 


706 


V-04 


707 


K/-07 


708 


K/-08 


709 


V-11 


.1188 


V-12 


711 


V-17 


1364 


V-24 


714 


W-25 


1365 


V-28 


1189 


V-35 


1366 


K/-38 


1190 


K/-39 


1109 


V-41 


718 


N-47 


1368 


K/-49 


1369 


V-55 


77 


K/-57 


720 


K/-58 


1370 


V-61 


721 


p-64 


722 


V-65 


1371 


V-68 


1448 


V-71 


1495 


K/-74 j 


724 


V-75 


1372 


V-80 


726 


K/-90 


1374 


VI-03 


864 


VI-04 


865 


K/l-07 


93 


VI-08 


867 


VI-09 


1378 


M-12 


869 


VI-13 


870 


VI-14 


871 


VI-16 


873 


VI-19 


875 


VI-20 


876 


VI-21 


1380 


M-23 


878 


VI-24 


879 


VI-25 


1192 


VI-26 


881 


M-32 


885 


VI-39 


887 


VI-43 


1382 


M-44 


1193 


Vl-45 


889 


|vi-48 


891 
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VI-49 


892 


VI-50 


893 


VI-53 


895 


Vl-55 


897 


VI-58 


899 


VI-66 . 


903 


VI-67 


904 


VI-70 


108 


VI-71 


1387 


VI-74 


905 


VI-75 


906 


VI-76 


907 


VI-77 


110 


Vl-79 


1389 


VI-80 


908 


VI-85 


910 


VI-87 


911 


VI-88 


912 


VI-90 


1390 


VI-93 


1391 


VI-95 


915 


VI-96 


1392 


VII-02 


1195 


Vll-03 


1196 


Vll-06 


1394 


VII-08 


1197 


Vll-09 


1198 


VII-10 


1395 


VII-11 


1396 


Vlf-15 


1199 


VII-17 


560 


VII-19 


562 


VII-21 


564 


VII-22 


565 


VII-23 


566 


VII-24 


567 


VII-25 


1397 


VII-26 


250 


VII-27 


568 


VII-29 


570 


VII-32 


571 


VII-33 


572 


VII-36 


575 


VII-39 


; 576 


VII-41 


578 


VII-42 


579 


VII-43 


580 


VII-46 


583 


VII-47 


1200 


VII-48 


1201 


VII-49 


585 
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[VII-54 j 


589 


MI-57 


591 


VII-58 


592 


VII-59 L 


593 


VII-62 


594 


MI-63 


1202 


MI-64 


596 


VI 1-66 


598 


K/ll-67 


1399 


K/ll-72 


600 


VI 1-73 


601 


K/ll-77 


1203 


MI-80 


605 


K/ll-82 


607 


VI 1-86 


1453 


K/ll-87 


610 


K/H-90 


612 


VII-91 


613 


Vtl-92 


614 


MI-93 


615 


VII-96 


617 


VIII-09 


618 


IVIlI-1 0 


619 


vm-13 


622 


VIII-16 


624 


MII-20 


628 


VIII-21 


629 


VIII-22 


1455 


VIII-23 


630 


VIH-24 


631 


VIII-25 


632 


VIII-26 


1456 


VIII-27 


633 


VIII-28 


634 


VIII-29 


635 


VIII-30 


636 


VIII-31 


637 


VIII-32 


638 


VIII-33 


639 


MII-34 


1204 


VIII-38 


643 


MII-40 


644 


VIII-41 


645 


Vlll-46 


649 


krui-48 


651 


VIII-55 


656 


K/lll-57 


658 


VIII-59 


660 


Vlll-60 


661 


VIII-61 


1205 


IVUI-64 


663 
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VIII-66 


665 


VI 11-73 


672 


VIII-74 


673 


VIU-76 


675 


VIII-80 


679 


X-07 


808 


X-15 


814 


X-20 


817 


X-29 


821 


X-34 


825 


X-46 


833 


X-54 


837 


X-56 


839 


X-68 


1207 


X-72 


849 


X-73 


1208 


X-94 


860 


XI-13 


1209 


XI-37 


1460 


XI-43 


1210 


XI-67 


1211 


XI-81 


1212 


XII-07 


1213 


XII-35 


1214 


XII-36 


1215 


XII-59 


1216 


XII-65 


1028 


XII-92 


1217 


Xlll-03 


917 


XIII-04 


1218 


XUI-19 


1219 


XIH-24 


926 


Xill-51 


938 


XHI-52 


939 


XIII-67 


947 


XIII-69 


949 


XIII-88 


1220 


XIII-92 


1221 


XV-22 


1099 


XV-24 


1101 


XV-25 


1224 


XV-42 


1108 


XV-62 


1226 


XV-64 


1118 


XV-84 


1125 


XVI-19 


1228 


XVI-36 


1056 


XVI-53 


1230 


XVI-60 


1071 


XVI-66 


1074 


XVI-74 


1081. 
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XVI-76 


1083 


XVI-77 


1084 


XVII-31 


1139 


XVII-40 


1231 


XVII-48 


1148 


XVII-76 


1160 


XVIl-87 


1165 


XVH-95 


1172 
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Table 3 



List of informative probes (Clone IP) selected for breast cancer diagnosis based 
on their occurrence criterion during variable selection . 



Occurrence* 


Clone ID 


100% 


XI-8^CVI-66,Vm-66,XVI-59,Vn-03,Xm-19pai-35^C-35^CU 

56^ai-26.rV-53,Xin-29,Xin-62,I-30,lU'06^CV-22^CV-94,Vn- 

l5,Vn-39,lX-39^CVH-39J[I-40,Vn-32 


90% 


l-52,VI-65.VT-34JV-62 4 XV-34,XVll-58,V-ll,VI>78par-36 i Xin- 
92,Vm-29,XVI-53^CVI-77^a-l3^an-84,IV-14^ai-31,V-80,Vn- 

48, A V ll-/y, A Vll- IjL 


80% 


111-60. VHl-74,K-12,X-O4^an-52,Vm-30,IX-38 


70% 


VM9«X-29,VBI-48 


60% 


IV-82JX-10,VI-S2,X-68,Vll-77 


50% 


IV-15 


40% 


XV-28,U-70,V-55 


30% 


XVU-17.XVII-67 


20% 


XI-58^CVI-36,Vin-39,VIIl-44,in-6iaV-69 3 XV-68,X-72 


10% 


IX-42,IX-77^C-94,XV-96^CVTJ-55 


5% 


Xn-59,XVT-76,I-54,XV-18,V-94^:-54,VI-07,Vn-47,XVII- 
31»XVn-87,XvH-48 


In at least one model 


11-41 ,VT-41 ,m-57,UI-89,Vn-73^CV-25,IV-26 I X-34,lV-4l,vlI- 

90^-42pCvTI-82^n-27,vm-20a-28,Vn-60 > Vnr-76,in-20 f VI- 

84^-07^CVn-28OT-17,XVn-36,Xll-52,XVn-76,Vffl-46,VI- 

70,XV-74,XV-93,Vm-3 l,n-87,V-39 ,VJ-55,X-07^C-15,XU- 

07,XVn-07,XVn-08^CVU-95J-24JV-32,V-32,VI-48 > VI-72J[V- 

80,rX-48 < X-56 1 XV-24 J XH-32 > XVn-40 



* 100% - Genes appearing in all the 75 cross validated models; 90% «■» Additional genes 
appearing in at least 68 out of 75 cross validated models; 5% - Additional genes appearing in 
at least 4 out of 75 cross validated models and so on. 
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Table 4c\ 



List of informative probes for diagnosis of Alzheimer disease 



Clone ID 


Seauence ID 


I-H1 

1MJ 1 












A £- 1 






313 










1 (TO 




1-71 








1-86 




1-95 




11-03 




11-05 


ODO 


u_ne 
IHJO 


ODH 


II- 1U 


ODO 




qoi 






N-96 


wUv 


II..OQ 
H Vv 




ll-J*r 


391 


lt-42 


398 


11-47 




11-57 


411 


11*61 


415 


11-69 


423 


11-75 


429 


11-63 




11-64 


438 


11-88 


442 


11*90 




11-94 


448 


111-02 


453 


111-05 




111-06 


458 


111-08 


460 


IIM0 




111-13 


1 464 


HI-15 




111-17 




III-23 


473 


III-26 


476 


III-35 


485 


III-39 


! 487 


III-43 


490 


III-44 


491 


HI-53 


500 


III-56 


603 



Clone ID 


Sequence ID 


ni-60 




IIH53 


509 


llt-68 




III-74 


518 




523 


Ui-82 






526 


I1IJQ9 
lll-9fa 








I\/.oq 
1 V fcO 








IXA.OO 




I Y*«3 1 ! 


vwf 






■ v*o^ 




Iw'-KJ 




iv-ftn 


701 






IV 9v 




V-01 




V-02 




V-03 


706 


V-04 


707 


V-06 




V-07 


708 


V-12 


711 ( 


VMS 




V-17 




V-21 


_ - 


V-25 




V-35 




V-42 




V-43 




V-47 




V-49 


- 


V-52 




V-54 




V-58 




V-59 




V*65 




V-68 




V-71 




V-75 




V-79 


m 


V-BO 


726 


V-90 




V-91 




V-92 
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Clone 10 


Sequence ID 


VI-02 




VI-04 


865 


VI-09 




VI-10 


_ 


VI-12 


889 


VI-14 


871 


VM7 




VI-20 


876 


VI-21 


- 


VI-23 


878 


VI-41 


- 


VI-42 


- 


VI-43 


_ 


VI-44 


m 


VI-48 


891 


VI-49 




VI-50 


893 


VI -53 


895 


Vl-71 




VI-74 


905 


VI-76 


907 


VI-78 




VI-79 




VI-87 


911 


VI-88 


912 


VI-90 




VI-93 




VI-95 


915 


VI-96 




VII-02 




VII-03 


- 


Vll-06 


- 


VH-10 


- 


VIM1 


- 


VI 1-1 9 


562 


VII-21 


554 


VII-25 


- 


VII-36 


575 


VII-42 


579 


VH-43 


580 


VII-46 


583 


VII-59 


593 


VII-63 


1 595 


Vll-66 


598 


VII-67 




VII-72 


600 


VIK73 


601 


VII-75 


- 


VI-02 




Vl-04 


866 


VI-09 




VMO 




VM2 


873 


VI-14 


875 


VI-17 





PCT/GB2003/005102 



102 



Clone ID 


Sequence ID 


Vlh91 


813 


VH-93 


615 


VIII-01 


- 


VIII-02 


- 


vm-03 


- 


I VM-06 




VIII-09 


618 


VIII-10 


- 


VIIH5 


- 


VHf-22 


- 


VIH-26 


- 


VIII-2B 


634 


VIII-30 


636 


VIII-32 


638 


VIII-33 


639" ! 


Vlll-41 


645 


VHI-42 


646 


VIH-48 


651 


VIII-58 


• 


VIII-64 


663 


VIII-65 




VIII-67 


666 


VHI-78 




VIM-82 




VW-B3 




VIW-85 




VIII-87 




VIII-91 




VIII-92 




VIII-93 




VJII-95 
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Table 4 b 

List of sequences of probes informative for Alzheimer disease 



Please see note to Table 1 



Clone id 




1 HQ 


298 


i in 


299 




300 


l- ID 


301 


1 17 


302 




304 

out 


i on 


305 


1 oo 


306 


i oo 


307 


t OA 


308 


1-viO 




1 OP 


310 


i *>-t 
l-o I 


311 


i oo 


312 


1 OA 


313 


1 OP 
WO 


314 


1 OQ 


315 




316 


]_AA 


317 


I/4C 


318 


l-*HD. 


319 


! A7 


320 


1 Aft 


321 


1 AO 


322 


l-Oo 


323 


l-OD 


324 


L-57 

1 w 1 


325 


1-58 


326 


1-60 


327 


1-64 


328 


1-67 


330 


1-69 


331 


1-71 


332 


1-72 


333 


1-73 


334 


1-77 


335 


1-79 


336 


1-80 


337 


1-81 


338 


I-82 


339 


VI-02 


340 
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1 VI-03 


341 


VI-04 


342 


VI-06 


343 


V1-07 


344 


VI-08 


345 


VI-09 


346 


VI-11 


347 


VI-18 


348 


VI-19 


349 


VI-20 


350 


VI-21 


351 


VI-22 


352 


VI-25 


353 


VI-26 


354 


VI-27 


355 


VI-31 


356 


VI-33 


357 


VI-35 


358 


VI-48 


359 


iI-02 


360 


II-03 


361 


I n-05 


363 


II-06 


364 


II-07 


365 


lt-08 


366 


II-09 


367 


IMO 


368 


11-11 


369 


n-12 


370 


11-13 


371 


11-14 


372 


11-15 


373 


j 11-16 


374 


| 11-17 


375 


11-18 


376 


i H-20 


377 


11-21 


378 


U-22 


379 


II-23 


380 


II-24 


381 


II-25 


382 


II-26 


383 


II-27 


384 


II-28 


385 


II-29 


386 


II-30 


387 


] H-31 


388 


II-32 


389 


li-33 


390 


1I-34 


391 


I II-35 


392 
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11-37 


393 ] 


11-38 


394 


11-39 


395 


11-40 


396 


11-41 


397 


II-42 


398 


II-43 


399 


II-44 


400 


II-46 


401 


II-47 


402 


H-48 


403 


II-49 


404 


II-50 


405 


II-52 


406 


II-53 


407 


II-54 


408 


II-55 


409 


II-56 


410 


II-57 


411 


II-58 


412 


H-59 


413 


H-60 


414 


11-61 


415 


H-62 


416 


H-63 


417 


II-64 


418 


U-65 


419 


II-66 


420 


II-67 


421 


II-68 


422 


II-69 


423 


II-70 


424 


11-71 


425 


II-72 


426 


II-73 


427 


II-74 


428 


II-75 


429 


II-76 


430 


11-77 


431 


11-78 


432 


11-79 


433 


11-80 


434 


11-81 


435 


II-82 


436 


II-83 


437 


H-84 


438 


II-85 


439 


II-86 


440 


II-87 


441 


H-88 


442 


II-89 


443 
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~TF90 


444 | 


11-91 


445 | 


11-92 


446 


11-93 


447 


11-94 


448 I 


11-95 


449 ! 


11-96 


450 


|" IH-01 


452 ! 


IJM)2 


453 


ilt-03 


454 


IK-04 


455 


Ill-OS 


457 J 


111-06 


458 I 


m-07 


459 | 


111-08 


460 


111-09 


461 


111-11 


462 


111-12 


463 


m-13 


464 


111-14 


465' 


111-15 


466 i 


111-16 


467 


111-17 


468 


111-18 


469 j 


111-19 


470 | 


111-21 


471 


III-22 


472 1 


IH-23 


473 j 


III-24 


474 


III-25 


475 I 


M-26 


476 1 


IU-27 


477 1 


III-28 


478 ] 


Hl-29 


479 1 


111-31 


T 481 | 


III-32 


482 1 


III-33 


483 


IU-34 


484 


m-35 


485 


III-37 


486 


III-39 


487 | 


III-40 


488 


III-42 


489 


III-43 


490 J 


III-44 


491 


III-45 


492 J 


Hl-46 


493 J 


III-47 


494 j 


III-48 


495 


III-49 


496 j 


III-50 


497 | 



WO 2004/046382 



PCT/GB2003/005102 



111-51 


498 


III-52 


499 


III-53 


500 


III-54 


501 


III-55 


502 


HI-56 


503 


IH-57 


504 


I11-58 


505 


111-59 


506 


111-61 


507 


UI-62 


508 


III-63 


509 


III-64 


510 


III-65 


511 


IH-66 


512 


III-67 


513 


III-69 


514 


IH-70 


515 


111-71 


516 


III-73 


517 


III-74 


518 


III-75 


519 


III-77 


520 


III-78 


521 


HI-79 


522 


III-80 


523 1 


111-81 


524 


III-83 


525 


ni-85 


526 


III-86 


527 


UI-87 


528 


UI-88 


529 


III-89 


530 


111-91 


531 


III-93 


5.32 


II1-94 


533 


III-95 


534 


IK-96 


535 


VII-02 


547 


VII-03 


548 


VII-04 


549 


VII-05 


550 


Vll-06 


551 


VII-07 


552 


VII-08 


553 


Vll-09 


554 


VII-10 


555 


VII-11 


556 


VII-12 


557 


VU-14 


558 


VII-15 


559 
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VIM 7 


560 


VIM 8 


561 


VIM 9 


562 


VI 1-20 


563 


VII-21 


564 


Vlt-22 


565 


VII-23 


566 


VII-24 


567 


VII-27 


568 


Vlt-28 


569 


VII-29 


570 


VII-32 


571 


VII-33 


572 


VII-34 


573 


VII-35 


574 


VII-36 


575 


VII-39 


576 


VII-40 


577 


VII-41 


578 


VII-42 


579 


VII-43 


580 


VII-44 


581 


VII-45 


582 


VII-46 


583 


VII-48 


584 


VII-49 


585 


VII-50 


586 


Vll-52 


587 


VII-53 


588 


VII-54 


589 


VII-55 


590 


VH-57 


591 


VII-58 


592 


VII-59 


593 


VII-62 


594 


VII-63 


595 


VII-64 


596 


VII-65 


597 


VH-66 


598 


VII-71 


599 


VII-72 


600 


VII-73 


601 


VII-74 


602 


VII-76 


603 


Vll-77 


604 


VII-80 


605 


VII-81 


606 


VII-82 


i 607 


VII-83 


608 


Vll-84 


609 


VII-87 


610 
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VII-89 


611 


VII-90 


612 


VII-91 


613 


VII-92 


614 


VII-93 


615 


VII-94 


616 


VII-96 


617 


VIII-09 


618 


VIII-10 


619 


VIIH1 


620 


vm-12 


621 


VIII-13 


622 


VIII-15 


623 


vm-16 


624 


vm-17 


625 


VIII-18 


626 


VIII-19 


627 


vm-20 


628 


VIII-21 


629 


VIII-23 


630 


VIII-24 


631 


V1II-25 


632 


VIII-28 


634 


VIII-29 


635 


VIII-30 


636 


VIII-31 


637 


VIII-32 


638 


VIII-33 


639 


VIII-34 


640 


VIII-36 


641 


VIII-37 


642 


Vlll-38 


643 


VIII-40 


644 


VIII-41 


645 


VHI-42 


646 


VIII-43 


647 


VIII-45 


648 


VIII-46 


649 


VIII-47 


650 


VIII-48 


651 


VIII-50 


652 


VIH-51 


653 


VIII-53 


654 


VIII-54 


655 


VIII-55 


656 


VIII-56 


657 


VIII-57 


658 


VIII-58 


659 


VIII-59 


660 


VIII-60 


661 


VIII-61 


662 
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VIII-64 


663 i 


VIII-65 


664 


VIII-66 


665 


VIII-67 


666 I 


VIII-68 


667 


VlH-69 


668 | . 


Vlll-70 


669 


VIII-71 


670 


VIH-72 


671 _J 


VIII-73 


672 


VHl-74 


673 


VIII-75 


674 ! 


VIII-76 


675 


VIII-77 


676 


VIII-78 


677 


VIII-79 


678 | 


VIII-80 


679 


IV-02 


681 Z\ 


IV-04 


682 


IV-13 


683 


tV-14 


684 


IV-17 


685 


IV-28 


686 


IV-31 


687 j 


lV-32 


688 n 


IV-38 


689 ] 


IV-40 


690 


IV-42 


691 


IV-44 


692 


IV-47 


693 ! 


IV-55 


694 | 


IV-56 


695 


IV-61 


696 


IV-64 


697 


IV-65 


698 


IV-72 


699 


IV-73 


700 J 


IV-80 


701 


IV-85 


702 I 


IV-93 


703 


IV-95 


704 ! 


IV-96 


705 


V-03 


706 


V-04 


707 


V-07 


708 i 


V-08 


709 1 


V-09 


710 


V-12 


711 


V-18 


712 


V-20 


713 


V-24 


714 
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V-37 


716 


V-40 


717 


V-41 


718 


V-48 


719 


V-57 


720 


V-61 


721 


V-64 


722 


V-65 


723 


V-74 


724 


V-80 


726 


V-81 


727 


V-87 


728 






VI-13 


870 


VI-14 


871 


VI-16 


873 


VI-23 


878 


VI-24 


879 


VI-28 


883 


VI-32 


885 


VI-38 


886 


VI-39 


887 


VI-45 


889 


VI-46 


890 


VI-49 


892 


VI-50 


893 


VI-52 


894 


VI-53 


895 


VI-54 


896 


VI-55 


897 


VI-57 


898 


Vl-58 


899 


VI-63 


900 


VI-65 


902 


VI-66 


903 


VI-67 


904 


VI-74 


905 


VI-75 


906 


VI-76 


907 


VI-80 


908 


VI-81 


909 


VI-85 


910 


VI-87 


911 


VI-88 


912 


VI-91 


913 


Vl-94 


914 


VI-95 


915 


VI-96 


916 


1-13 


1177 


1-14 


1178 


I-30 


1180 
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1-54 


~ " 1181 


1-88 


1182 


111-20 


1183 


IV-15 


1185 


IV-26 


1186 


IV-62 


1187 


V-11 


1188 


f\/-28 


1189 


V 38 


1190 


V-45 


1191 


VM4 


1193 


VII-47 


1200 


I-42 


1332 


I-52 


1333 


I-86 


1336 


I-95 


1337 


fll-10 


1342 


III-60 


1347 


III-82 


1348 


IK-92 


1351 


IV-23 


1353 


IV-34 


1354 


IV-35 


1355 


IV-41 


1356 


IV-45 


1357 


IV-82 


1359 


V-01 


1361 


V-02 


1362 


V-06 


1363 


V-17 


1364 


V-25 


1365 


V-35 


1366 


V-42 


1367 


V-47 


1368 


V-49 


1369 


V-58 


1370 


V-75 


1372 


V-79 


1373 


V-90 


1374 


V-91 


1375 


V-94 


1376 


VMO 


1379 


V1-41 


1381 


VI-43 


1382 


VI-71 


1387 


VI-72 


1388 


Vl-79 


1389 


VI-90 


1390 


Vl-93 


1391 


VH-25 


1397 


VII-60 


1398 
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VI 1-67 


1399 


VI 11-22 


1403 


VlH-26 


1404 


VIII-39 


1405 


VIII-44 


1406 


1-37 


1440 


V-32 


1445 


V-52 


1447 


V-68 1 


1448 


V-92 


1449 


VI-42 


1450 


VI-78 


1452 


VI 1-86 


[ 1453 


VH-88 


1454 


1V-29 


1490 


V-15 


1491 


V-39 


1492 


V-54 


1493 


V-S9 


1494 


V-71 


1495 
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Table 5 



Samples 



i Diagnosis 


No. of women 


Normal /Benign 


42* 


DCiS 


3 


Invasive cancer 


26 



^ JFXQTTl one wujju.au, wuuic uiuuu ^ 7—7- ? ^ — w 

Hence, the number of unique nonnal/benign samples tested in the experiment is 75 



Information about women with breast cancer 



• 

Sample 






Cancer tvne 


Size hist 
(mm) 


Nodes 


1 


Q 1 




roc 


20 


1/7 


2 


o*f 


ii 


IDC 


22 


2/2 


o 


50 


i 


DCIS+ 
1IDC 


>50DCIS; 
5x14 


0/7 


4 


47 


i 


roc 


15 


0 


5 


69 


in 


ILC g.2 + tubular 
adenocarcinoma 


50+3 


l 12 + 1 av / 


ft 


50 


ii 


roc 


24 


0 


7 


65 


i 


IDC 


15 


0 


8 


63 


ii 


IDC 


23 


0 


9 


55 


i 


IDC + DCIS 


4 


0 av 1 


10 


52 


0 


DCIS + small 
colloid carcinoma 
foci 


50 + 3 


0 


11 


60 


II 


IDC 


24 


0 


I 12 


54 


1 




11 


0 


13 




0 


Das 


20 


0 


14 


49 


0 


DCIS 


9 


0 


15 


48 


1 


IDC 


4 


0 


16 


56 




IDC 


4 


0 


17 


68 


1 


IDC 


14 


0 


18 


68 


1 


IDC 


7 


0 


19 


63 


1 


IDC 


10 


0 


20 


45 


1 


IDC 


19 


1 


21 


57 


III 


IDC 


60 


8/20 


22 


55 


II 


IDC/DCIS 


35+55 


0 


23 


71 


1 


IDC/extenshre 
DCIS 


8 


o 1 


24 


56 


1 


EDC 


9 ■ 


? 
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25 


66 


II 


IDC 


26 


S 0 


26 


66 


1 


IDC 


15 


? 


27 


61 


1 


IDC 


9 


? 


28 


? 


? 


? 


? 


? 


29 


65 


1 


IOC 


11 


0 



Other diseases /conditions present in the women tested 



Disease/condition 



Other diseases /conditions p resent in the women tested 



Diabetes 



Asthma 



Ulcerous colitis 



Hemochromatose 



Crohn's disease 



Fibromyalgia 



Psoraiasis 



Atopic eczema 



Rheumatism 
Allergies 



Prior history of cancer in the women tested 



Cancer type 


No. of women 


Breast 


3 


Colon 


2 


Stomach 


1 


Skin 


1 
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to 



s 
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13 
© 



CM 



CM 



CM 



CM 



S3 



CD 
10 

Si 



2 
o 
wn 

Cm 

o 

i 

CO 

to 



s 



S5 



O 



u> 



J8 

00 



s 
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s 



o 

<=> 

s 



s 



88 

id 



s s 

II 

"Si 

IS 



8 
5 



CO 



CO 



CO 
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Table. % 

Some relevant features of the blood donors. B, Female donors with breast cancer; N, Female 
donors with suspected mammogram but no breast cancer; IDC, invasive ductal carcinoma; DOS, 
ductal carcinoma in situ; na, not available nd f not determined; ++, no degradation of mRNA and no 
ribosomal contamination in the sample, +, no degradation of mRNA but ribosomal contamination in the 
sample. 







AGc 


Cancer type 

/DTBQSl 

abnormality 


Size Hist 
(mm) 


mRNA 
Quality 


1 


B1 


na 


IDC 


5 


++ 


2 


B2 


49 


DCIS 


8 


nd 
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IDC 


18 
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IDC 


ii 
\L 
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DCIS+micro 
invasive cancer 
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List of sequence of probes informative for both alzheimer and breast cancer 
disease 



Clone ID J 


Sequence ID 


1-94. 


308 


1-9^ 


309 


l-9ft 
l-^O 


310 




321 




327 


1-79 


333 




338 
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360 




361 
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1 1 u / 
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II- I VJ 


368 
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1 1 a- i 
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VI-09 
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Vll-02 


547 
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548 
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554 
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( Vll-11 


556 


VII-15 


559 


VI 1-1 7 


560 


VIM 9 


562 


VII-21 


564 


VII-22 


565 


VII-23 


566 


VII-24 . 


567 


VII-27 


568 


VII-29 


570 


VII-32 


571 


VH-33 


572 


VH-36 


575 


VII-39 


576 


VII-41 


578 


VH-42 


579 


VI I-43 


580 


VII-46 


583 


Vll-48 


584 


VII-49 


585 


VII-54 


589 


| Vll-57 


591 


VII-58 


592 


VII-59 . 


593 


VII-62 


594 


Vll-63 


595 


VH-64 


596 


VII-66 


598 


VH-72 
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615 
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VIII-23 


630 


VIII-24 
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Nucleotide sequences 

Sequence ID - 93 nt: 405 

GGATCCTGTGGCCCAC^GAGCTGCCCCAGCAGACGCTCCGCCCCACCCGGTGATGG 

AGCCCCGGGGGGACAATCGTGCCTGGGGAGGAGCAGGGTACAGCCCATTCCCCCAG 

CCCTGGCTGACCTGGCCTAGCAGTTTGGCCCTGCTGGCCTTAGCAGGGAGACAGGG 

GAGCAAAGAACGCCAAGCCGGAGGCCCGAGGCCAGCCGGCCTCTCGAGAGCCAGAG 

CAGCAGTTGAATGTAATGCTGGGGACAGGCATGCTGCCGCCAGTAGGGCGGGGACC 

CGGACAGCCAGGTGACTACCAGTCCTGGGGACACACTCACC^ 

GGCAGGACAGATCGGGGAAGGGGTGTGTACCAGGCTATGATTTCTCTTGCATTAAA 
ATGTATTATTATT 

Sequence ID - 108 nt : 550 

GGCTTTGAC^GAGTGCAAGACGATGACTTGCAAAAT 

TAGANACCATGATCAACACCTTCCACCAATACTCTGTGAAGCTGGGGCACCCA 

ACCCTGAACCAGGGGGAATTCAAAGAGCTGGTGCGAAAAGATCTGCAAAATTTTCT 

CAAGAAGGAGAATAAGAATGAAAAGGT<^TAGAACAG 

CAAATGCAGACAAGCAGCTGAGCTTCGAGGAGTTCATCATGCTGATGGCGAGGCTA 
ACCTGGGCCTCCCACGAGAAGATGCACGAGGGTGACGAGGGCCCTGGCCACCACCA 
TAAGCCAGGCCTCGGGGAGGGGACCCCCTAAGACC^ 
GCCACGGCGACGGCCACAGTCATGGTC 

CAGGCCACCCTGCCTNTACCCAACCAGGGCCCCGGGGCCTGTTATGTCAAACTGTC 
TTGGCTGTGGGGCTAGGGGCTGGGGCCAAATAAAGTCTCTTTCTCC 

Sequence ID 110 

ACGAAGACAGACATCTGTGGAATGATTCACATCCTCTCAAGTTAGGAGGA 

CCTGCTTCATTAAGAAGCTGGGGGTAGGGTGGGGGTGGGGAGAACACTTAACAA^ 

TGGGGACCAGTCAGGGGAATCCCCTTATTTCTGTTTTGCATATGAGGAACCCTAGA 

GCAGCCAGGTGAGGCTCTCTAGTTTAATAAAAATCATGGAAAGACTCTTAATGCAG 

ACTCTTCTTAAGTGTTAATAGGGATTTTTTCAGCTTATTTTGGTTGCAGTTTCCAA 

TTTTTAAAAATGTTGAGGTAATCTTTCCCACCTTCCCAAACCTAATTCTTGTAGAT 

GCATTAGTGTTGAACCAATGCTTTCTCATGTCTCAATTCTTTGTATATGCATTCTT 

TTCAGATGTATTAAACAAACAAAAACCCTTC 

Sequence ID - 192 nt : 286 

CCGGTAATAGAATAGAAAAGGGAGAGTGTCTTCATGCAATGTGGCATCCTGGATTG 

GGTCTCGNNACAAAAACAGGACATTAGTGGGAAAATTGGAAATCTGAAAAAAGTCT 
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GAATTTTAGTTAATATACCAATTTCAGTCTCTTGGTTTTGACAGATGTACCATGGT 
GATGTAAGATGTTGACCTTGGGGTAGGCTGGGTGAAGGGTATACAGGAACTCTTTG 
TACTATCTCTGCAACTTCTCTGTAAATCTAGTATCATTCCAAAATAAAAGTTTATT 
TAATTT 

5 

Sequence ID 250 

GTGGAAGTGACATCGTCTTTAAACCCTGCGTGGCAATCCCTGACGCACCGCCGTGA 
TGCCCAGGGAAGACAGGGCGACCTGGAAGTCCAACTACTTCCTTAAGATCATCCAA 
CTATTGGATGATTATCCGAAATGTTTCATTGTGGGAGCAGACAATGTGGGCTCCAA 
10 GCAGATGCAGCAGATCCGCATGTCCCTTCGCGGGAAGGCTGTGGTGCTGATGGGCA 
AGAACACCATGATGCGC^GGCCATCCGA 

GAGAAACTGCTGCCTCATATCCGGGGGAATGTGGGCTTTGTGTTCACCAAGGAGGA 
CCTCACTGAGATCAGGGACATGTTGCTGGCCAATAAGGTGCCAGCTGCTGCCCGTG 
CTGGTGCCATTGCCCCATGTGAAGTCACTGTGCCAGCCCAGAACACTGGTCTCGGG 
• 15 CCCGAGAAGACCTCCTTTTTCCAGGCTTTAGGTATCACCACTAAAATCTCCAGGGG 
CACCATTGAAATCCTGAGTGATGTGCACTGATCAAGACTGG 

Sequence ID 299 

CAGCGCAGGGGCTTCTGCTGAGGGGGCAGGCGGAGCTTGAGGAAACCGCAGATAAG 

AGGTTACTAAGATATTGCTTAGCGTTAAGTTTTTAACGTAATTTTAATAGCTTAAG 
ATTTTAAGAGAAAATATGAAGACTTAGAAGAGTAGCATGAGGAAGGAAAAGATAAA 
AGGTTTCTAAAACATGACGGAGGTTGAGATGAAGCTTCTTCATGGAGTAAAAAATG 
TATTTAAAAGAAAATTGAGAGAAAGGACTACAGAGCCCCGAATTAATACCAATAGA 
2 5 AGGGCAATGCTTTTAGATTAAAATGAAGGTGACTTAAACAGCTTAAAGTTTAGTTT 
AAAAGTTGTAGGTGATTAAAATAATTTGAAGGCGATCTTTTAAAAAGAGATTAAAC 
CGAAGGTGATTAAAAGACCTTGAAATCCATGACGCANGGAGAATTGCGCATTTAAA 
GCCTAGTTACGCATTTACTAAACGCAGACGAAAATGGGAAGATTAATTGGGAGTGG 
TAGGATGAAACAATTTTGGAGAAGATAGAAG 

30 



Sequence ID 30 0 

CTCAAAGGAGAAAAAAAACCTTGTAAAAAAAGCAAAAATG 




35 CCTTTTTTGTCTATGAAGTTGCTGTTTATTTTTTTTGGCCTGTTTGATGTATGTGT 
GAAACAATGTTGTCCAACAATAAACAGGAATTTTATTTTGCTGAGTTGTTCTAAAA 
AAAAAAAAAAAAAAAAA 
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Sequence ID 302 

AGTAGAGACGGGGTTTCACTGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGT 
GATCCGGCCACCTCGGCCTCCCGAAAGTGCTGGGATTACAGGCGTGAGCCACGGCG 
CCCAGCCCCAGCCTGTG^CTTAAACTGATAAACGACAGATTAACAGTAGAAAAATT 
5 TTATTTTGCATACATAATGAGGCTTCACAAAAGAGAAGTGAAAACCCAAGTAGGAG 
TTTAGGGCTGGGGGCTTATATACCATTTAACAAGGGGTGATAAATTGTAAGAGAAT 
AG 

Sequence ID 3 04 

10 TCCTTGGTTTCGATTTGTGGCAACAATCCAGTCTTTTTGTTTTTTTCAGGGATACC 
ATATGTAACAGGTGCCATTGTTACTGTAACTTTTCACACATGCCTTCAGTTTGATG 
TCAAAGTCATCATTTAGTGTAAACA^ 

CTTTACTTTTAGAAAGTCTTATCTTTTATGCCACAGAAATAGCATTTGGCTATTAG 
TCATGGATGGCAAAGAAATTAATTTTGAGTTGTTTGGATAAAAATGTTTCAGTTGA 

15 CTGTAGTGTGTATTGAGAGACACTGCCAGTAAACAAACTCTCTTGGTAGGTGGAAA 
TCCCCTAGAAGTTACAGAAAATTGGGAGGAGGTGAACTTAATTAAATAACTTGAAT 
TGTTTAGACATATTCAGAGCTTCTTATGACCTTGAAGAAATCACCCAACTTCAAAA 
GACCTCGGTTTCTTCATTTGTAAAATTAGGGAGTTTGACTAGATGTGTAAATCTAG 
TTGTTAGTTAACTTCTAAGATGTAAAAACCCTCTTGTTTAACAAAAACCTACAAGA 

2 0 TCAAGTTGCTTATCTGAAATCTTTATGAATCAACACTAGTCACTAAGTCTAGCTCG 
ACC 

Sequence ID 3 06 

CTTTTCCTCCCGCTGTCCCCCACGGAGGGGACTGCTCTCCCCCGCTGCATCCTTTC 

2 5 TGTGAGGTACCTTACCCACCTCAGCACCTGAGAGGGTGAAATAGAATTCTAACCTC 

GACATT CGGGAAGTGTTTTTGAGAAGTCTCGGT CGGTAAGGGAAGT CTTC CAAGT C 
CGTGCAGCACTAACGTATTGGCACCTGCCTCCTCTTCGGCCACCCCCCAGATGAGG 
CAGCTGTGACTGTGTCAAGGGAAGCCACGACTCTGACCATAGTCTTCTCTCAGCTT 
CCACTGCCGTCTCCACAGGAAACCCAGJ^ 

3 0 CAAGGCATTTATTGCAGTGTACTATTTGCTTCCAAAGGATCAGGCCCTGAGAACAA 

TGACCTTATTTCCTACAACAGTGTCTGGGTTGCGTGCCAGCAGATGCCTCAGATAC 
CAAGAGATAACAAAGCTGCAGCTCTTTTGATGCTGACCAAGAATGTGGATTTTGTG 
AAGGATGCNCATGAANAAATGGACNAGCTGTG 

35 Sequence ID - 308 nt : 373 

AAGTGGGTCTTGCCATCCCTGAACTGNAATCATCCCTAACATATTCATACCTGTTT 
TCATTTTAAAAGTTGGGTCAGTTTTTTTATTAGTACATGTATTTCTATCCTACTGA 
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TTTATTTGCTATATCATCTAATTTAGTTTGAATATTCCATAATTTACTTAATTAGT 
CCTGTATGGAGACCTAGCTCTTCTCAGTGTCTACTATTATAAACAATGCTACAGTG 
AATATTGGTGNATAAATCCATACNCACCACGTACATATCTTAAGTTCTGGAAGAGA 
TATTGCTAAACCAGAAGATAACCTGCATTTAAAATTTGACTGCTAGGGNCAGGGNC 
5 ACATTTAATTAAATTAGAACAANGAATGCATAATGNC 

Sequence ID 309 

CCGGAATCGCGGCCGCGTCGACGAAAATATGTGCCCTGGCCAACTCCACAGGACTA 
GTTCTAGGCAATCTGAAGGAAACCAGAAAATGTGAATTTCTCTTCCCTCAAAAAGC 
, 10 TATACTGAAGTAGTATTTAATATTCAAGTACTTGTAAATTTGCAGAACAGTACTTT 
TTAATTTGACCCATGAATTCTATTTAAATTTGTCACTTAATATTTAGCCAAGAAGC 
AAACCATCTAAAAAGATTTCTGGTTTATTTCTCCAACTCCTAATAAATAGGGTCAC 

AATGAAACAACACAAAGGGATATGTTTTGAAAAGTGGTCTTGCCATCCCTGAACTG 
1 5 TAATCATCCCTAACATATTCATACCTGTTTTCATTTTAAAAGTTGGGTCAGTTTTT 
TTATTAGTACATGTATTTCTATCCTACTGATTTATTTGCTATATCATCTAATTTAG 
TTTGAATATTCCATAATTTACTTAATTAGTCCTGTATGGAGACCTAGCTCTTCTCA 
GTGTCTACTATTATAAACAATGCTACAGTGAATATTGGTGNATAAATCCTACACAC 
CACGTAACATATCTTAAGTTCCTGGAAGAGATATTGCTAAACCAGAAGATAACCTG 
2 0 CATTTAAAATTTGACTGCTAGGGTCAGGGTCACATTTAAATTAAATTAGAACAAGG 
AATGCATAATGTCTTCGATAGCAATCTATTCAAGGTGCACCGTGGTCACAZ^AGGAA 
AGCAAAACTGTC 

Sequence ID - 310 nt:564 

2 5 CCTGGNCAGAGGCCTCTATCCTC 

ATTTAATTTCTATGGGNAATAGTCCTTTTCTTAGCTTCTGCCNNTCACTTGCTTAT 
TTTTTGTGTGGGAATGGGGTTGGATAAACCAATGAACTTTATTATAAACAAATCCC 
ACCTATATCTANCAAATTTATATTTTCGGTGAAATACAGATATTTGCCTTTGTGGA 
GTANTATAGAAGCTGTCAATATGTATCTACTGTACAGTACTAAATAGTATTCATTT 

30 ATGAAATGAGTAGTGTTTGGGTGGCTGGGGTTAAGGAAAAATGAGACTTGGAATTG 
TAGCTTTTATCCAAGTTTTGAGTATAl^ATAGGGTTTTGTTTTGTTTTTTTTAACCT 
AAAAACTGAAATGCCATATAGAAAAACAGCATTGTTTTTACAGTTTGTAGTAAGTA 
ACTTTTTAAAGATTTTATCAAAAAGAATTTTGTCTATNGTGAGTAAAAGAAGTTCT 
AATAATGGCCTAATCACTGCATTTTTAAAAAACAAAGTTCAACACAAATGACATTT 

35 GTTT 

Sequence ID 3il 
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CCTCTCCTCCATCTAAAGGCAACATTCCTTACCCATTAGTCTCAGAAATTGTCTTA 
AGCAACAGCCCCAAATGCTGGCTGCCCCCGGCCAAGCATTGGGGCCGCCATCCTGC 
CTGGCACTGGCTGATGGGCACCTCTGTTGGTTCCATCAGCCAGAGCTCTGCCAAAG 
GCCCCGCAGTCCCTCTCCCAGGAGGACCCTAGAGGCAATTAAATGATGTCCTGTTC 
CATTGG 

Sequence ID - 313 nt: 554 

CCCGGAATCGCGGCCCGCGTCGACAACAAACCTGCATGTTCTGCACATGTATCCAG 

GAACTTAAAAAAAAAAAAAGATAGTTTGTGTGTCTTAATTGAATAATAGTAGATTT 

ATAGATTAAAGATCTATGGGTTTTTAATATGGATTANAAATCTGTGGGTTTTTGAT 

ATGGATTANAAATCTGTGGGTTTTTAATATGGATTGGAAATCTGTGGGTTTTTAAT 

ATGGATTAAAAAACATCTGTGGGTTTTTAATATGGATTAAACATCTGTGGGTTTTT 

AATATGGATTAAACATCTGGGTTTTTAATATGGATTAAACATCTGTGGGTTTTTAA 

TATGGGTTAAAAATC^^AAGAAAATGAACTATTTGCTCCAGTGCAGGAAAATACAG 

GCAATACTGGATACAATTAGATGGTCAGGAGCGATAACCCGGTTGCCATTGTTTGA 

AGAAGAGAATAAGGNGCTAGCATTCCTATCCGTAGATAATTTGACAGCTAGGAAAT 

AGGGGGAGTCTTCTATGTAGTTAGTGAAGGCTAAATGAACTATTATATGC 

Sequence ID 314 

CTTTTCCTCCCGCTGTCCCCCACGGAGGGGACTGCTCTCCCCCGCTGCATCCTTTC 
TGTGAGGTACCTTACCCACCTCAGCACCTGAGAGGGTGAAATAGAATTCTAACCTC 
GACATTCGGGAAGTGTTTTTGAGAAGTCTCGGTCGGTAAGGGAAGTCTTCCAAGTC 
CGTGCAGCACTAACGTATTGGCACCTGCCTCCTCTTCGGCCACCCCCCAGATGAGG 
CAGCTGTGACTGTGTCAAGGGAAGCCACGACTCTGACCATAGTCTTCTCTCAGCTT 
CC^CTGCCGTCTCC^CAGGAAACCC^GAAGTTCTGTGAAC^GTCCATGCTGCCAT 
CAAGGCATTTATTGCAGTGTACTATTTGCTTCCAAAGGATCAGGCCCTGAGAACAA 
TGACCTTATTTCCTACAACAGTGTCTGGGTTGCGTGCCAGCAGATGCCTCAGATAC 
CAAGAGATAACAAAGCTGCAGCTCTTTTGATGCTGACCAAGAATGTGGATTTTGTG 
AAGGATGCACATGAAGAAATGGAGCAGGCTGTGGAAGAATGTGACCCTTACTCTGG 
CCTCTTGAATGATACTGAGGAGAACAACTCTGACAACCACAATCATGAGG 

Sequence ID 315 

TGGTACAGATACAAACTGGACTCTCAGGACAAAACGACACCAGCCAAACCAGCAGC 
CCCTGAGCATCCAGCAGCATGAGCGGAGGCATTTTCCTTTTCTTCGTGGCCAATGC 
CATAATCCACCTCTTCTGCTTCAGTTGAGGTGACACGTCTCAGCCTTAGCCCTGTG 
CCCCCTGAAACAGCTGCCACCATCACTCGCAAGAGAATCCCCTCCATCTTTGGGAG 
' GGGTTGATGCCAGACATCACCAGGTTGTAGAAGTTGACAGGCAGTGCCATGGGGGC 
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AACAGCCAAAATAGGGGGGTAATGATGTACGGGCCAAGCACTGCCCAGCTGGGGGT 
CAATAAAGTTACCCTTGTACTTG 

Sequence ID 316 

CGCCACTTATCCAGTGAACCACTATCACGAAAAAAACTCTACCTCTCTATACTAAT 
CTCCCTACAAATCTCCTTAATTATAACATTCACAGCCACAGAACTAATCATATTAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 

Sequence ID 321 

CAGAACAGTACTTTTTAATTTGACCCATGAATTCTATTTAAATTTGTCACTTAATA 
TTTAGCCAAGAAGCAAACCATCTAAAAAGATTTCTGGTTTATTTCTCCAACTCCTA 
ATAAATAGGGTCACATATTTTTTAACTTTTTTCTAATTTGAAAAGTAATACAGGCA 
TATGGTATTTTAAAAATGAAAGAACA^ 

GCCATCCCTGAACTGTAATCATCCCTAACATATTCATACCTGTTTTCATTTTAAAA 
GTTGGGTCAGTTTTTTTATTAGTACATGTATTTCTATCCTACTGATTTATTTGCTA 
TATCATCTAATTTAGTTTGAATATTCCATAATTTACTTAATTAGTCCTGTATGGAG 
ACCTAGCTCTTCTCAGTGTCTACTATTATAAACAATGCTACAGTGAATATTGGTGT 
ATAAATCCATACACACCACGTAACATATCTTAAGTTCCTGGAAGAGATATTGCTAA 
ACCAGAAGATAACCTGCATTTAAAATTTTGACTGCTAGGGTCAGGGTCACATTTAA 
ATTAAATTAGAACAAGGAATGCATAATGTCTTCGATAGCAATCTATTCCAGGTGCA 
CCGTGGTCACAAAGGAAAGCAAAACTGTCAATAACTTTCTTCTCA 

Sequence ID 322 

CTTTATTGAGGTTCGAAATTAATAAAGAAAATAAAAGAAATGTATCTTCATTCATT 
CTGTATGTTAGTGTTTTAATTACCCTTAGAATATATGGATAAAAAATACTATTCTT 
TGTCTTGGAGAAGGTAAGAGTCTAGTTAGATGAATAAGGGTTATCTATGTAGAACA 
ACTAGAGAATGAGAAGAGAGCTTATGAGATTGAGTACTACGTTATGCAGTAGAGTA 
GCACGTCATCTGCTACTGAGTATGGTGTGATAACATTGTGTAACAGGAAAGTATGA 
TCAATATCTACTTAAAATTAAGGACAATATTAGCACTACATTGCTTTATTTTAAAG 
TAAAAATTAGAGAACTAAACACAAGCATTGTAAGTACAATAAAAGCTGATCTTTCT 
AGTTAAGCAGAATAATACATGTTCAAGCATCTGCTAAATCATTAAATATAAGAATA 
TAGGGGTTTTCTATAATCTTATTTTCTTTGGAAGAGTACCTCATTTTCAAGANGAG 
AAGTTTCTAATTGCCACTTCTTTAAAAATAAAACAGGGTTTTAATGTTCCCAGCAC 
AAAAATTAATATCTCTTCAAAAAGTCTCTTGTGATTAAGTTTGAATCCCTTGTCAT 
ACTGCTTCTAATATTGACACTGACCTCCTTAGGTATTTTTCAGGGGTTATAATCTT 
TTCTTAAGGTATCTTTTTTCAAGAATTGGATACCTTGGGCTT 
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Sequence ID 323 

CGCGTCGACTTTTAAAGTCATCTCTATAGGAAGGTGCTGGGCAGGGATCCCAGAGA 
AAGAAAGGGTCC^GACTCC^TTAACTGCCCTGGATGAAGGG(^CTGCTA(^GCAG 
CTAGTACCAGAGACTCTCCTATCTCACGGTTGAGGCAGACCCAGGATAGAATAGAG 
5 AATAAAAGGAATGCTTATAGGAAACAATTTTGTATGGAATGCTAGATGGCCAAGCC 
TCAGCCTTTGGTCCAGTGCAACCCTTGCCTCGCTTGTCAACAGTGAAAAATTAGTT 
TGGTTAGAAGAACCATCTGGAAACACACCAGCTTCTGCTACCTTCATGCTCATTGT 
TAAAAAAAGATTAACCAGTGTGAACATTCTGATCTGTTAATTCCAGGGACTGTTTT 
CTTTCCAATGGACTGTTTGTTGGTAGAATAACCCCCAAAAGCTCAAAGCTAAAATG 
1 0 CATCATCAGTCCTAGTCGGCAGTTCCTTAAGAATGGACTGGCGGCGTGGTTGAGCT 
GATATGGAAAAGCTGCACCTTCCTGCAGAAGATCAACTGACCTGCTATCCCACCCC 
AAATTCAACCTGAGGTATATTTCAGTGAAGCAGGTAGCTGTGCTTCTCAAAGCAGA 
GAAGCAGTTTTAAGAACCAT^AAAGGTAGAGGAAATCTA 

15 Sequence ID 324 

GTTTGTTACAGGCAGAATTGGATAGATACAGCCCTACAAATGTATATGCCCTCCCC 
TGAAAA?^AATTGGATGAAAATCTGCACAGCAAAGTGAAACACACAGATAATAGGAA 
CAAAATGTAGTTCCCATGTGCCAAACAAAATAAATGAAATCTCTGCATGTTTGCAG 
CATATCTGCCTTTTGGGAATGTAATCAAGGNATAATCTTTGGCTAGTGTTATGTGC 

2 0 CTGTATTTTTTTAAAATGGTACACCAGAAAAGGACTGGCAGTCTACTTCTACCATA 
GTTAAACTTCACCCTCTTTAATTTCACAACATATTCTTTGGAAGCAGGAAGAAATG 
CTCATAAAGAGGATCAGACCTTCTTTCCCGTGAAACCAGTATTTGGCGCCATATAT 
AAGCCTGGTTAAATTGGTCATCTAAAGCTGTCAAATAAGACATTCTGTGAAAGGTA 
AAGATCGAAACTGGTTATAAGTAAAACCATC!^^ 

2 5 ACCTTTGAAGCTTATTGTCTGGCCTGCACCAGAAGATGTCTGCATTACTCATTGCT 

AAAAATGTGTACACAGAACTGCACTAGGATTAATTGGTTCAAGAAGAAATTTAAAC 
TTACGTTTGGGTTTCCATACAGCACTCTATTGAATACATGCATCTGAATTTAAGTT 
GCAA 

30 Sequence ID 325 

GACCAGTAATGGCTTTTAAGAGTCCATTTTGTCATTGTCTCCCTAGTTAATTACAG 
GTGGGGGATCTTTTGCCTCTATTCTCTTCATATTGAAATGAATCATACTCATGTTT 
TGTGGAACTCCTTAAAGTTGTAGCTGTCATGATCAGATTTTTTTTATATTTCCTCA 
GCTTAACTCTGCTACTTGATTTACAGTGACCCATAACCTACTCATCCTTGGTTTAT 

3 5 AGTGACACATAATCTTATCTCTTTATAGAACCTTAAATTTTATCATTATTTTCGCT 

TAGAATACAGCATTTCTTTGCTTCTGTTGCTGGTTTGACTTAAGAAATAAGGCAGT 
AACTCTGATCAATCAATTATCCATAAGGAAGGGCTTTTCATGGGTTCTATTAATTT 
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GTTAGTACCCTAAGTATATCTGAAAAATATGTCTATTGAGAGAAGATTTTGGCATT 
CCAGATGGTATAGTCTATATATATTTAAAGTTTTGAATTTGCTTATATATACTCAG 
CTTTCTTTTTCTAGCATTTTTGCATTTACCTGTTAATTGAAGTATACCCCCCACAT 
ATAAAA.GTTCCTCTTAAAGACACTGGACTCTTTCTGGGGGGCTAAAATA 

Sequence ID - 326 nt : 554 

CCCGGAATCGCGGCCCGCGTCGACAACAAACCTGCATGTTCTGCACATGTATCCAG 

GAACTTAAAAAAA?^AAAAAGATAGTTTGTGTGTCTTAATTGAATAATAGTAGATTT 

ATAGATTAAAGATCTATGGGTTTTTAATATGGATTANAAATCTGTGGGTTTTTGAT 

ATGGATTANAAATCTGTGGGTTTTTAATATGGATTGGAAATCTGTGGGTTTTTAAT 

ATGGATTAAAAAACATCTGTGGGTTTTTAATATGGATTAAA.CATCTGTGGGTTTTT 

AATATGGATTAAACATCTGGGTTTTTAATATGGATTAAACATCTGTGGGTTTTTAA 

TATGGGTTAAAAATCAAAAGAAAATGAACTATTTGCTCCAGTGCAGGAAAATACAG 

GCAATACTGGATACAATTAGATGGTCAGGAGCGATAACCCGGTTGCCATTGTTTGA 

AGAAGAGAATAAGGNGCTAGCATTCCTATCCGTAGATAATTTGACAGCTAGGAAAT 

AGGGGGAGTCTTCTATGTAGTTAGTGAAGGCTAAATGAACTATTATATGC 

Sequence ID 327 

CGGCTACCGACAGAAGGACTATTTCATCGCCACCCAGGGGCCACTGGCACACACGG 
TTGAGGACTTCTGGAGGATGATCTGGGAGGGGAAGTCCCACACTATCGTGATGCTG 
ACGGAGGTGCAGGAGAGAGAGCAGGATAAATGCTACCAGTATTGGCCAACCGAGGG 
CTCAGTTACTCATGGAGAAATAACGATTGAGATAAAGAATGATACCCTTTCAGAAG 
CCATCAGTATACGAGACTTTCTGGTCACTCTCAATCAGCCCCAGGCCCGCCAGGAG 
GAGCAGGTCCGAGTAGTGCGCCAGTTTCACTTCCACGGCTGGCCTGAGATCGGGAT 
TCCCGCCGAGGGCAAAGGCATGATTGACCTCATCGCAGCCGTGCAGAAGCANCAGC 
AGCAGACAGGCAACCACCCCATCACCGTGCACTGCAGTGCCGGAGCTGGGCGAACA 
GGTACATTCATAGCCCTCAGCAACATTTTGGAGCGAGTAAAAGCCGAGGGACTTTT 
ANATGTATTTCAAGCTGTGAAGAGTTTACGACTTCAGAGACCACATATGGTGCAAC 
CCTGGAACAGTATGAAATGTGCTACAAAGTGGTACAAGATTTATTGATATATTTCT 
GATTATGCTAATTTCAATGAAGATCCTGCCTTAAATATTTTTTAATTTAATGGGAN 

AT 

Sequence ID 328 

CAAGACTCCATCTCAAAAAAAAAAAAAAATCTACAGTGCTGAGTATATAAAATTAT 
TAACACATTTCACAACAATATGTGTTTGTGGAGTTAAATATTTTTTGTCTTTAAAA 
CAGGTAATTTTAGTGCATACTTAATTTGATGATTAAATATGGTAGAATTAAGCATT 
TTAAATGTTAATGTTTGTTACATTGTTCAAGAAATAAGTAGAAATATATTCCTTTG 
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TTTTTTATTTAAATTTTTGTTCCTCTGTAAACTAAAAGAACACGAAGTAATTGGTC 
ACAATTACTGGTGTTTAACTGCCAAATATGGGTAAATAAGGGAAAATTTTGTTTAA 
TATTTAGTCCTTCTGAGATGGCTTGAATATTTGAATTTTGTTGTACGTCTATACTG 
GGTAGTCACAAGTCTTATAAACACTTTAGAGGAAAGATGGATTTCAGTCTGTATTT 
TTAAACATCATTTATTTTAAATCTGGTGCTGAAAAATAAGAAAAAAATTAAACTGC 
ATTCTGCTGTTCTTCTTTANAAGCATTCCTGCGTAAATACTGCTGTAATACTGTCA 
TGCAAAGTGTATCCTTTCTTGTCGTATCCTTTTTGGGGCAGTGGTTTTT 

Sequence ID 33 0 

GCGGGAATCGCGGCCCGCGTCGACCTCAAAGGAGAAAAAAAACCTTGTAAAAAAAG 
CAAAAATGACAACAGAAAAACAATCTTATTCCGAGCAT 

TATGTACTTAGCTGTACTATAAGTAGTTGGTTTGTATGAGATGGTTAAAAAGGCCA 
AAGATAAAAGGTTTCTTTTTTTTTCCTTTTTTGTCTATGAAGTTGCTGTTTATTTT 
TTTTGGCCTGTTTGATGTATGTGTGAAACAATGTTGTCCAACAATAAACAGGAATT 
TTATTTTGCTGAGTTGTTCTAAAAAAAAAAAAAAI^AAAAAAAAAAAAAAAAAAAAA 
AAAAAAAT^AAAAAATTTTAAAATTTTTAAAATAAAACCCTTGGTTAT 

Sequence ID 331 

GCCGCGTCGACCTGCATGAGCCACAGTTTCTTGACTGGAGGCCATCAACCCTCTTG 
GTTGAGGCCTTGTTCTGAGCCCTGACATGTGCTTGGGCACTGGTGGGCCTGGGCTT 
CTGAGGTGGCCTCCTGCCCTGATCAGGGACCCTCCCCGCTTTCCTGGGCCTCTCAG 
TTGAACAAAGCAGCAAAAGAAAGGCAGTTTTATATGAAAGATTANAAGCCTGGAAT 
AATCAGGCTTTTTAAATGATGTAATTCCCACTGTAATAGCATAGGGATTTTGGAAG 
CAGCTGCTGGTGGCTTGGGACATCA]OTGGGGCCAAGGGTTCTCTGTCCCTGGTTCA 
ACTGTGATTTGGCTTTCCCGTGTCTTTCCTGGTGATGCCTTGTTTGGGGTTCTGTG 
GGTTTGGGTGGGAAGAGGGCCATCTGCCTGAATGTAACCTGCTAGCTCTCCGAAGC 
CCTGCGGGCCTGGCTTGTGTGAGCGTGTGGACAGTGGTGGCCGCGCTGTGCCTGCT 
CGTGTTGCCTACATGTCCCTGGCTTGTTGAGGCGCTGCTTCAACCTGCACCCCTCC 
TTGTCTCATAGATGCTCCTTTTGACCTTTTCAAAATTAATATGGATGGGAAAGCTC 
CTATGCCTTTTGGCTTCCTGGTAGAAGGCGGGATGCCCAAGGGTCTGCCTGGGTGT 
GGATTGGATGCTTGGGGTGTGGGGGTTGGAAACTGTCTTGTGGCCCACTTGGGCCC 

C 

Sequence ID 335 

CCCGCGTCGACTTTTAAAGTCATCTCTATAGGAAGGTGCTGGGCAGGGATCCCAGA 
GAAAGAAAGGGTCCAAGACTCCATTAACTGCCCTGGATGAAGGGGACTGCTACAGC 
AGCTAGTACCAGAGACTCTCCTATCTCACGGTTGAGGCAGACCCAGGATAGAATAG 
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AGAATAAAAGGAATGCTTATAGGAAACAATTTTGTATGGAATGCTAGATGGCCAAG 
CCTCAGCCTTTGGTCCAGTGCAACCCTTGCCTCGCTTGTCAACAGTGAAAAATTAG 
TTTGGTTAGAAGAACCATCTGGAAACACACCAGCTTCTGCTACCTTCATGCTCATT 
GTTAAAAAAAGATTAACCAGTGTGAACATTCTGATCTGTTAATTCCAGGGACTGTT 
5 TTCTTTCCAATGGACTGTTTGTTGGTAGAATAACCCCCAAAAGCTCAAAGCTAAAA 
TGCATCATCAGTCCTAGTCGGCAGTTCCTTAAGAATGGACTGGCGGCGTGGGTGAG 
CTGATTTGGAAAACTGCCCTTCTGCAAAAAACACTGGCCTGCTTTCCA 

Sequence ID 337 

1 0 CAAGACTCCATCTCAAA2\AAAAAAAAAAATCTACAGTGCTGAGTATATAAAATTAT 
TAACACATTTC^CAACAATATGTGTTTGTGGAGTTAAATATTTTTTGTCTTTAAAA 
CAGGTAATTTTAGTQG^TACTTAATTTGATGATTAAATATGGTAGAATTAAGCATT 
TTAAATGTTAATGTTTGTTACATTGTTCAAGAAATAAGTAGAAATATATTCCTTTG 
TTTTTTATTTAAATTTTTGTTCCTCTGTAAACTAAAAGAACACGAAGTAATTGGTC 

1 5 ACAATTACTGGTGTTTAACTGCCAAATATGGGTAAATAAGGGAAAATTTTGTTTAA 
TATTTAGTCCTTCTGAGATGGCTTGAATATTTGAATTTTGTTGTACGTCTATACTG 
GGTAGTCACAAGTCTTATAAACACTTTAGAGGAAAGATGGATTTCAGTCTGTATTT 
TTAAACATCATTTATTTTAAATCTGGTGCTGAAAAATAAGAAAAAAATTAAACTGC 
ATTCTGCTGTTCTTCTTTAGAAGCATTCCTGCGTAAATACTGCTGTAATACTGTCA 

2 0 TGCAAAGTGTATCCTTTCTTGTCGTATCCTTTTTGGGGCAGTGGTT 

Sequence ID 3 38 

CTGGACTGCATGACCAGATCTGATGGGTGAGACTCAGGTGGCATGGAAGAGCCGAA 
AGAGGATACCATATGTGGGTGCCGGGGGGGATAGGTGAGAAGTACTAGAAGGCGGA 

2 5 ATGGAAGGACACTTCTGCTCAGCTCTGTGACACGGGCAGGGACCCTGCAGGGCTCA 

GGTCCTTTAACACAGGA.GCT 

AGTATGCAGACTAAGCTCTTGCTTGGCTGATACGGCTTTTTGGGTTTTTAGAGAAC 
ATGCATATATGTTCTCATTCATGGTACATGAACTCAGAAGCCTTACTGCCTATTTT 
TGTTAATACTTCTGGGCAAACATTACCACT 

3 0 TTGTAAAATGTTATTTAATAAAGCCAAAGAACTAAATCATATTTATTTTCCAAGGN 

TTTCTAAGATCTCTGAAACTAATGAGGTTTTTTAAATCCCCATTAAGTACTCATCA 
CTGCTAGTAAAAGCAGTTGTCTTTACCTTTAATTCCAGTGAGTCCCCTTAAATTTA 
TTTTTTATTATCTTTGGCTACATTGCCTTAGACAAAATGTGGTCACCCTAATTTAA 
NGGATAAAATTCACATCCTCACAGATTTCTTATTAAGAGGGTCTAANCCTTGAATA 
3 5 ATCANCAGTGGAAATGGAAGTCTTCTTTACTGGNTTTNATCCTTTCCCTTTTTTAT 
CCCATG 
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Sequence ID 33 9 

TTTTTTTTTAAATAAAGCTGTCGGCACTCAAGGGTAATTTCATATCAGTGTGNTCT 
ACAAGCTGGGGGAAAATGAGTTCTAATTGTCANAGCTACCAAATCCTTCACCTTTA 
GCATAAAGGTTTAAAGATATCAC^AAGATGCCAAGTGATTAATAATGTTTTAAACC 
5 ACCCCTTTTTCTGTCTGAAAAAACAACTAAAACAATATTACAACAGTATAGTTACA 
GAAGGGTTCTATTTTCATATGTTTTATGCACACTGTGCCTCAAAGGTACTATTTAA 
ATATATATACTTTTGAGGGGGTGGCTAATGCAGAAACACCCAAGACCTAAGGAAGA 
TACAACCCCATTTCTAGGTGTGAGGTCTAAATGCTTCACACACCCACTTGTGACCT 
TTTTTCATGAAGAATCATAACACTGTGCAGTGAGAAACAGTGGCAAAGC^ 
1 0 AAAGCATTTTAAATTATTTACTAGGTTAAAAGGGTGAACTGATACTTTAAATACAT 
CAAATTTCATCAT 

Sequence ID 360 

GCAAGTGAGAGCCGGACGGGCACTGGGCGACTCTGTGCCTCGCTGAGGAAAAATAA 
15 CTAAACATGGGCAAAGGAGATCCTAAGAAGCCGAGAGGCAAAATGTCATCATATGC 
ATTTTTTGTGCAAACTTGTCGGGAGGAGCATAAGAAGAAGCACCCAGATGCTTCAG 
TCAACTTCTCAGAGTTTTCTAAGAAGTGCTCAGAGAGGTGGAAGACCATGTCTGCT 
AAAGAGAAAGGAAAATTTGAAGATATGGCAAAAGCGGACAAGGCCCGTTATGAAAG 
AGAAATGAAAACCTATATCCCTCCCAAAGGGGAGACAAAAAAGAAGTTCAAGGATC 
20 CCAATGCACCCAAGAGGCCTCCTTCGGCCTTCTTCCTCTTCTGCTCTGAGTATCGC 
CCAAAAATCAAAGGAGAACATCCTGGCCTGTCCATTGGTGATGTTGCGAAGAAACT 
GGGAGAGATGTGGAATAACACTGCTGCAGATGACAAGCAGCCTTATGAAAAGAAGG 
CTGCGAAGCTGAAGGAAAAATACGAAAAGGTA 

25 Sequence ID - 361 nt : 622 

CTGTNATNGAATCTGCTTGTNACTNAAATGCTAAACTCAATTCTGTAATTCAATAG 
GTGCACCTNTCTGAGAAACATANNAGACAATGAGGAAAAGGATTCANCATTCCGTG 
GAATTTGTACCATGATCAGTGTGAATCCCANTGGCGTAATCCAAGTAAGATGTTCA 
CAAAGATTTGTTTTTAATGTCTAATTAATAAAATTTTAAAGGAAGAAACATTCTAA 

3 0 TACTTTAATTATAA2^AGTTAACTATTTTCAAAGGTATCAAAATACAGTTZ^AACCT 
TTAAAATGTATATTTCTTAATATCTTGAAATTGTAATGCCTTTTTTTTTTCCTAAA 
TTTTTTTTGTCATGAAATGAGATAGTAACAGCAGATTGGGACAACAAGGTTATATT 
CTTGTCTTGAATCAGGCCATGGCTTCTTTCATCCAAATTTCAGACCTCATTTATTT 
ACTTTGTCCCTGCCTCCCATCCCTGGATATCANGTTTGTGGATATCTACAGTTAAT 

3 5 AGAGTGACCAAATAGTAGGAATACTGTCTCTCTATTCTGAATAAAATACTTTGAAT 
CAGATTTAGAAATAATGAATAAAATACAAATCACCATTGAAATTGCTCTAATTTTG 
AGAGCT 
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Sequence ID - 363 nt : 628 

ATCACNTGAGGCAAGAGTTT^^ 

ACAAAAATATAAAAATTAGCCTGGGTGGTGATGGGCACCTGTAACCCCAGCTACTC 
GGGAGGCTGAGGTAGGAGAATCACTTGAACCCGGGAGATGGAGGTTGCAGTGAGCC 
5 AAGATCGTGCCACTGC^CTCCAGCCTGTGTGACAGAAC^GACTCTGTCTC^AAAA 
AAAATAATAATAATAATAATAATAAAAAGGAATAACATAGCTAGGAATAAATTTAA 
TCAAAGAGGTGAAAGACTTATACACTTAAAACTACAAAAAAAAAATCACTGAAGGA 
ATTATAGACCCAAATAAAAATAAATAAAAAGACATTCTGTGTTTTAGGGAAAGAAG 
ACTTAATATTGTTAAGATGTCAATACTACCCAAAGTGATCTACAGATTCAACATA 
10 'TCCCTATCAAAATTCCAACAGC^ 

TTCAGATGGAATTGCGAGGGGTTCTGAATAACAAAAACAATCTTGGGGAAAAAAAA 

CAAAAAAGAAAGTCAAAGAACTCACACT^ 

ATAGTAATCAAA 

15 Sequence ID - 364 nt : 528 

TGAACATCCAGCCATGTCATTTCTTCCATTCCTGCCCTGGAGTAAAGTAGATTTAC 
TGAGCTGATGACTTGTGTGCATTTGTACATTGCAACCTTAGCTTACCTCTTGAAGC 
ATGTAGAGCATTCATCACCGACCAT 

TCGTGGTCTGTCTGCTCCCTGTGCCACCCCCACCCCATCAGGTGGGCCTTTTGCAA 
2 0 GTGATGAAGTCACCTGTGGGGGAAGAGCTTTCCTTTCCTCTCCTCAACTCAGAAGG 
CCTCTTCCTCTTGCTCAAGAGGGTGCTGCTGCTTTCTGCCTCCTTCCCCGGCCGGC 
CTCCATCCCAGTTCACCTTTTCAGAAATGGCCCCTCAGTCAACTCTTCCCTTTTCT 
CCTGGCTTTTTATTTCTCCCAGTCTCTTAAGAGTATCCTTAGCTTTAAAAACAATA 
ACACAGAGGATGGGTGCAGTGGCTCATGCCTGTAATCCCAGCACTTTGGAGCCTGG 

2 5 GGCGGGCGGATCACTTGAGGNCA 

Sequence ID 365 

GTCCCGGAATCGCGGCCGCGTCGACCTTTTCTATGCCTGCTATATAAACAGTACCT 
TGCAAGATGTCCTGTCTGATATCCACAAAGGGGTATTGTCAACCCCAAGTTCAGAC 

3 0 AGCTTTGTATTCTTCTGTCCCTGGATACATGAATTACTGCCATCTTTACACAGCGC 

CCTAAAATACCAACGCGAAGTTACCTGCTCAGCTTGAAGCTGCGCTGTACCCTGGA 
ACCAG(^CTTCTGCTGAATGACT(^GGATGAAGCCTCGACTTCTCCTTCCCATCCC 
ATGCCCAGACCCCAGTGGCTCCTTTCCC^TCTGATCCAGTGACTTTAAGTCCAGC 
TGTTGCAACCTGGGCATGAGGAGGAGTGCAAGATGGCTTTGTCCTACCTGGAAAGA 
35 GGCTTTCTGGA 

Sequence ID 366 
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C^CCATTTACACACAGTGGGTCCTTGAATAGCATCGTTTTATTCAATGTCATTTTG 
TTATAACATTGAGAAAAAAATTGATTCCCGGCTGGGGCCACTGTCTGTGCACCGT 

Sequence ID - 368 nt : 329 

GAAAGATCTAAAATCGAG&CCCTAACATC 
GGAZ^TTCAAAAGCTAGCAGAAGGCAAGAAATAACT 
AGAGATAGAGACACAAAAAACCATTCAAAAAAAAACAATGAATCCA 
TTTTAAAAAGATCAACAGAATTGACAGACTGCTAGCAAGACTAATAAAGAAGAGAG 
AAGCATCAAATAGACTCAATAAAAAATGATAAAGGGGATATCACC^CCAATCCCAC 
AGAAATACAAACTACCATCAGAGAACACTATAAACACCTCTATGCAAAT 

Sequence ID 3 69 

GAAAGATCTAAAATCGACACCCTAACTVT 

GCAAATTCAAAAGCTAGCAGAAGGCAAGAAATAACTAAGATCAGAGCAGAGCTGAA 
AGAGATAGAGACACAAAAAACCATTCAAAAAAAAACAAT 

TTTTAAAAAGATCAACAGAATTGACAGACTGCTAGCAAGACTAATAAAGAAGAGAG 
AAGCATCAAATAGACTCAATAAAAAATGAT^^ 
AGAAATACAAACTACCATCAGAGAACACTATAA^ 
AAAAT 

Sequence ID 370 

GAAAGATCTAA2^TCGACACCCTAACATCACAATTAAAAGAACTAGAGAAGCAAGA 
GGAAATTCAAAAGCTAGCAGAAGGCAAGAAATAACTAAGATCAGAGCAGAGCTGAA 

5 TTTTAAAAAGATCAACA 
Sequence ID 371 

GCCCGGAATCGCGGCCGCGTCGACGTAAGCTCGGCTGAATCCACGGTTCAAGAACA 
GGAAAGAAGGCCAAGGCATAGGGAGTGGGGCAGTTGGGTGAATATTAGTACCTTTC 

0 CCTCAGNTNGATTAATTACCCCTGCCTA 

TCCTTTTTAATGGCCAGGTACAGCTGCTTATATGGANGGGCATTTNTNAATGATAT 
CCTTNATCACTGTCTTAATCATCACATNCTTAAAACAATCACTTTATTGTG 
GAAGATAAAAATGGCTGGGTTCAATTTCCGTTCTGGAAGAAATCGANTNAAAAGGT 
AACCATTTAATAATGCANAGGGCANTTTCACTGCAGACCCTAATACTGGAAATTTT 

5 TAAAAACAAATGAAAAACTTCTACTTTTTCTTCTAAGCTTACTTAACCACCCAAAT 
TTTCCAGCCACATATCTTCCTAGTCTACAACTGCCTTTAACTTTAAGAGATGCTCA 
AAZ^AATGTAAATTCTCAAATACATTCTTATTACAATTACTGCTAACCT 
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Sequence ID 373 
CGAGTGTGCTGGK-IATTA 

ATGATATTGGTTCTCAACCAAGGGTGACTTTGCCCCCAGAGGATACTTGGCAATGT 
CTGGAGATACTCAGTTGTCATGACTTGGACAGGTGCTACTGTCACCCAGTGGGTAG 
AGGTCAGGGATGGTGCTAAACATAGGACAGCTGTCAAGAGAAAAGAATGTACCCAG 
CCCCAAATGTCAGTAGGGCTGAGGTTGAGAAACCCAGCTGTAGCTGACGTGTGAAG 
GACAGACTGGCCTGGAAGTGTGTTTTCTGCGCCTTTCCACCCCTGCATATTAGTTA 
AGGCCAAAGGAAAAAAGGAATGCAGGAAATGCCCGTTAAAAATCTTCAAAACAATA 
TAAAATGATCAATTCCACTAAAACCCTTTACACATTTAAGTATAAAGGTATTGGTA 
GGAA2\ATTTGTTATTCACTGCTTTTCTCAGTGTCATGAAATAATTATTTCTGCTGT 

CAGTTT 

Sequence ID 3 74 

AAAAAAAAAATCACTGAAGGAATTATAGACCCAAATAAAAATAAATAAAAAGACAT 
TCTGTGTTTTAGGGAAAGAAGACTTAATATTGTTAAGATGTCAATACTACCCAAAG 
TGATCTACAGATTCAACATAATCCCTATCAAAATTCCAACAGCCTACTTTGTAGAA 
ATGGAAAAGCCAATTTTCAAATTCAGATGGAATTGCGAGGGGTTCTGAATAACAAA 
AACAATCTTGGGGAAAAAAAACAAAAAACAAAGTCAAAG 

TTATAATTTACTACAAAGTTATAGTAATCAAAGTCGACGCGGCCGCGATTCCGGG 
Sequence ID 378 

CGACTGCGGCTCTTCCTCGGGCAGCGGAAGCGGCGCGGCGGTCGGAGAAGTGGCCT 
AAAACTTCGGCGTTGGGTGAAAGAAAATGGCCCGAACCAAGCAGACTGCTCGTAAG 
TCCACCGGTGGGAAAGCCCCCCGCAAACAGCTGGCCACGAAAGCCGCCAGGAAAAG 
CGCTCCCTCTACCGGCGGGGTGAAGAAGCCTCATCGCTACAGGCCCGGGACCGTGG 
CGCTTCGAGAGATTCGTCGTTATCAGAAGTCGACCGAGCTGCTCATCCGGAAGCTG 
CCCTTCCAGAGGTTGGTGAGGGANATCGCCCAGG 



Sequence ID 3 80 

GCAATTTAATTTTTAATAACAAAGATACTGTATTTTAACATGGTGAAATATACTTG 
GCTAAGTCCAGATTAAl^AAAAAAAAGTATCTAGCCCAACAGTACAATTATACAGCT 
TTGTACAGAACATTCCATAGATCAACAGAAAATACATTTGAGCGCAAAAATAAAAA 
ATATTTAAGGAGAATCTCTAAGCAGCATTTTATTTCTGCAAAAGACATATCTTGTC 
TGATTAAATATCTACAAGTGCTTTTCCTTTCAAAAATACATATATTCTTAATAGAC 
TAAGTCATTAACAATGACCTGGTAATTCTTTCACTTCAATTTGAATGATTTATAAG 
CTAAATCTTCAACCACAAAAAGGTTTTTATTTGTATtAAGATGTTACCACTTTTGA" 
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CAAAAAGCTTAAAATATTTTATATTT CAAAGGAAAATTAGG^CATAACTTTACAA 
TATATTCTATGATATTTTGATTGTGAGGGCTACTCTATTTAAAACTGATGATCTCT 
GTTGTGTTGCTCAGATGCAGGAAAGCAGCAAAA 

Sequence ID - 381 nt : 534 

GACTTANATCTAAATGGACCACATTCTCTACTTAAAAAAATGCTATTAACCATGTG 

ATCTTCTCAGTCATGAGGTAATCTGGTGACTACCCTTCCTCAAAGCCAGTTGGGAT 

ATTCTTTGAATAGAGTAAAACAGTGTTTCTAGGCTGGGAGACACCAGACATAGTTG 

AGGACAGAGGTGCTAGAAAATAGGAAGTTTAAAAGCATGTGCGGTGATGCTCAGAG 

GAGGTAAACCCC^CCCTCATGCT(^TAGCTTC(^ATCATTTTCTCTAGTTCTTAAC 

TCTTAAATGTGAGAAATGCTTGAAGATTCTAGTCATCTGAAGAAAGTCTCTTTATT 

AAAGATTTTCATAAAAGAGACCAAAGCAGACAAACAGAAAAA 

AAAAACAAGGATAATGGGAAGAGAAGGAAAGTTTTAAAA 

GGGGACAAAATATTATATCCTATAAAGAGAGATTTTTATTTTTTAAAAAAATAGAA 
AGCAAAACAAGCTCCTAAAAATAAAGTTTG 

Sequence ID - 382 nt : 444 

GTTAAGGAAGTCAGCACTTACATTAAGAAT^ATTGGCTACAACCCCGACACAGTAGC 

ATTTGTGCCAATTTCTGGTTGGAATGGTGACAACATGCTGGAGCCAAGTGCTAACA 

TGCCTTGGTTCAAGGGATGGAAAGTCACCCGTAAGGATGGCAATGCCAGTGGAACC 

ACGCTGCTTGAGGCTCTGGACTGCATCCTACCACCAACTCGTCCAACTGACAAGCC 

CTTGCGCCTGCCTCTCCAGGATGTCTACAAAATTGGTGGTATTGGTACTGTTCCTG 

TTGGCCGAGTGGAGACTGGTGTTCTCAAACCCGGTATGGTGGTCACCTTTGCTCCA 

GTCAACGTTACAACGGAAGTAS^2\ATCTGTCGAAATGCACCATGAAGCTTTGAGTGA 

AGCTTTTCCTGGGGACAATGTGGGCTTCAATGTCAAGAATGTGTCTGTCAAG 

Sequence ID - 383 nt : 566 

CTTTGAAGAACTTTGCCAAATACTTTCTTACCAATCTCATGAGGAGAGGGAACATG 

CTGAGAAACTGATGAAGCTGCAGAACCAACGAGGTGGCCGAATCTTCCTTCAGGAT 

ATCAAGAAACCAGACTGTGATGACTGGGAGAGCGGGCTGAATGCAATGGAGTGTGC 

ATTACATTTGGAAAAAAATGTGAATCAGTCACTACTGGAACTGCACAAACTGGCCA 

CTGACAAAAATGACCCCCATTTGTGTGACTTCATTGAGACACATTACCTGAATGAG 

CAGGTGAAAGCCATCAAAGAATTGGGTGACCACGTGACCAACTTGCGCAAGATGGG 

AGCGCCCGAATCTGGCTTGGCGGAATATCTCTTTGACAAGCACACCCTGGGAGACA 

GTGATAATGAAAGCTAAGCCTCGGGCTAATTTCCCCATAGCCGTGGGGTGACTTCC 

CTGGTCACCAAGGCAGTGCATGCATGTTGGGGTTTCCTTTACCTTTTCTATAAGTT 

GTACCAAAACATCCACTTAAGTTCTTTGATTTGTCCATTCCTTCAAATAAAGAAAT 
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TTGGTA 

Sequence ID 3 84 

TTTTGGGGTTTATATATAAGCCTGGTTCTTGCTGAAACTGCTTATGTTGATAACCA 
5 GTTAGTGAGTTCCTCTCTATTGACTTGCTGGGAAGTTTATAGAGACATTTTTTATG 
CATTCAGAGATTTCAGTACAAATCTTGAAAAAGGGACATTTAGGCCGGGCGCGGTG 
GCTCACATCTGTAACCCTAGCACTCTGGGAGGCTGAGGTGGGTGGATCATGAAGTC 
AAGAGATAGAGACCATCCTGGCAAAAATTAGCTGGGCGTGGTGGGGTGCGCCCGTA 
GTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATTGCTTGAGCCCGGGAGGCGGAG 
1 0 GTTTCATTGAGCCGAGATAGTGCCACTGCACTCCAGCCTGGACAACAGAGCGAGAC 
TGTGTCTT 

Sequence ID 386 

CTAAGGGTTTAAAGATGGAAAGAGGCATTGATGAACAGCTGGGGAAGGAGTAGTTT 
1 5 GAGGTAGATGTGCAGATGGAATGAAGAGAAGGTCTCAAGAAGAGGGTGGAGCCAAA 
GAGGGCTGCAGATTTAGAAGGCTAAAGTCTTTAGATGGCTTTGGATAGCCTGTTGT 
ATCTTGGACCATGCAGGTTACAGTGGAGCATGGAGTGGGGACAGAAGTGGAGGAAG 
GAACCAGGGAACATGGAGTGAGAAGCTAAAGGAAAGTGATGCAGTAGATACATGGC 
TCTAAAGTACTCAGKIACTTTCAGAGGCTT^ 
2 0 ATGCCTGATACTAAGGGCATTCCCTGGATGTGGACCTTTCATTCCCCAAATTAGGA 
AAGTCTTGGGCATACCAAGACAAGTTGGCCACCCTACTCAAAAGTATGTAAGCTAA 
CATATCTGTTCTCTAAGAGGTTAAAGCTGGATGGGGATACCAGATGTATGTACGTG 
ATGCAGTTAAACAGCAATACAAGGGGGCAAGTCTACCTGATCGGCCAATTCAATGG 

GA 

25 

Sequence ID 387 

GAAGCCAAACCAAAGGAGCTTCTACTTCATGATGCCATTTATGTAAAGTTCAGGCA 
GAGAAAATCAGTGGTTTAAGAAGTTAGAATAATGATTATCTTTGGAGGGATTGCAA 
CTGGAAGAAGTCATGATTGGGATTTCTGGGTCCTAATAGTGCTCTGTGTCTTGATC 

30 TGAGTGCCGACTACATGAGTGGTTAGGTTTGCAAAATTCATTGAGTTATGCACTTA 
ATGGTGTTGTCTTATTAGAGCTGATGGAGGAGAGAGGGCTTGAATTTGCACAACTG 
AGTAATCAGCTAGGCCCAGTCACTAGGTGAACAACTTACTGCTCCAATCAGCCTTA 
GAGC^GGAATCAAACTCATGTCTCAGAAAAGTTATTAATTCAGCTTGTCTTGGGAC 
TTCCTTCAGAGTCACTCTTGAATAGCTGAAATAGTAAATGTTAAATCTGTGGATGC 

35 AAGTGTGTAAATTATTTTAGTCATCAGCTCTAATAAGATGGCCTTTGGGGAAATGA 
GTATAAGGTCACGAAAATGAAATGGCAAGAAGGAGGTCTACTATTTCTTCTGTAAT 
ACTGATTTTTACCCCATCAGGGTCAGTCCCCAGAGGTTGTAAATGTGAAGCTTG-T 
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CTTTTTCTTTAATAA 
Sequence ID 388 

CTTTGGACA.CTAGGAAAAAACCTTGTAGAGAGAGTAAAAAATTTAACACCCATAGT 
5 AGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTC^ 

AAAAATCCCAAACATATAACTGAACTCCTCACACCCAATTGGACCAATCTATCACC 
CTATAGAAGAACTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGCATAAG 
CCTGCGT CAGATTAAAAC ACTGAACTGACAATT AACAGC CCAATAT CTACAAT CAA 
CCAACAAGTCATTATTACCCTCACT^ 
1 0 AGGTTAAAAAAAGTAAAAGK^AACTCGGCAAATCTTACCCCGCCTGTTTACCAA?^^ 
CATCACCTCTAGCATCACCAG^ 

AACGGCCGCGGTACCCTAACCGTGCAAAGGTAGCATAATCACTTGTTCCTTAATTA 
GGGACCTGTATGAATGGCTCCACGAGGGTTCAGCTGTCTCTTACTTTTAACCAGTG 
AAATTGACCTGCCCGTGAAGAGGCGGGCATAACACAGCAAGACGAGAAGACCCTAT 
1 5 GGAGCTTTAATTTATTAATGCAAACAGTCCTAACAAACCCCAGGTCCTAAACTCCA 
AACCTGCATTAAA 

Sequence ID 389 

CGACCCGGAATTCGCGGCCGCGTCGACTGAGTTCTTGACAAGAGTGTTTTTCCCTT 

2 0 CCCGTCACAGAGTGGGCCCAACGACCTACGGCACTTTGACCCCGAGTTTACCGAAG 

AGCCTGTCCC(^^CTC(^TTGGC^^ 

GTCAAGGAAGCTGCCGAGGCTTTCCTAGGCTTTTCCTATGCGCCTCCCACGGACTC 
TTTCCTCTGAACCCTGTTAGGGCTTGGTTTTAAAGGATTTTATGTGTGTTTCCGAA 
TGTTTTAGTTAGCCTTTTGGTGGAGCCGCCAGCTGACAGGACATCTTACAAGAGAA 
25 TTTGCACATCTCTGGAAGCTTAGCAATCTTATTGCACACTGTTCGCTGGAAGCTTT 
TTGAAGAGCACATTCTCCTCAGTGAGCTCATGAGGTTTTCATTTTTATTCTTCCTT 
CCAACGTGGTGCTATCTCTGAAACGAGCGTTAGAGTGCCGCCTTAGACGGAGGCAG 
GAGTTTCGTTAGAAAGCGGACGCTGTTCT 

30 Sequence ID - 3 90 nt: 523 

GAATCCCTAGAAAAAGAGAATTCCCAACTTGATGAGGAAZyVCTTAGAACTGCGAAG 
GAATGTAGAATCTTTGAAGTGTGCAAGCATGAA2\ATGGCTCAGCTACAGCTAGAAA 
ACAAAGAACTGGAAAGTGAAAAAGAGCAACTTAAGAAGGGTTTGGAGCTCCTGAAA 
GCATCTTTCAAGAAAACAGAACGCTTAGAAGTTAGCTACCAGGGTTTAGATATAGA 

3 5 AAATCAAAGACTGCAAAAAACTTTAGAGAACAGCAATAAAAAAATCC^ 

AGAGTGAACTACAAGACTTAGAGATGGAAAATCAAACATTGCAGAAAAACCTAGAA 
GAACTAAAAATATCTAGCAAAAGACTAGAACAGCTGGAAAAAGAAAATAAATCATT 



WO 2004/046382 



PCT/GB2003/005102 



- 141 - 

AGAGCAAGAGACTTCTCAACTGGAAAAGGATAAGAAACAATTGGAGAAGGAAAATA 
AGAGACTCCGACANGAAGCAGAAATTAAAGATCCACATTTGAAGAAAATAATGTGA 
AGATTGGAAATTTGGAAAA 

5 Sequence ID - 391 nt : 566 

CTTTGAAGAACTTTGCCAAATACTTTCTTACCAATCTCATGAGGAGAGGGAACATG 
CTGAGAAACTGATGAAGCTGCAGAACCAACGAGGTGGCCGAATCTTCCTTCAGGAT 
ATCAAGAAACCAGACTGTGATGACTGGGAGAGCGGGCTGAATGCAATGGAGTGTGC 
ATTACATTTGGAAAAAAATGTGAATCAGTC^^ 
1 0 CTGACAAAAATGACCCCCATTTGTGTGACTTCATTGAGACACATTACCTGAATGAG 
CAGGTGAAAGCCATC^AGAATTGGGTGACCACGTGACCAAC 

AGCGCCCGAATCTGGCTTGGCGGAATATCTCTTTGACAAGCACACCCTGGGAGACA 
GTGATAATGAAAGCTAAGCCTCGGGCTAATTTCCCCATAGCCGTGGGGTGACTTCC 
CTGGTCACCAAGGCAGTGCATGCATGTTGGGGTTTCCTTTACCTTTTCTATAAGTT 
1 5 GTACCAAAACATCCACTTAAGTTCTTTGATTT^ 
TTGGTA 



Sequence ID 3 94 

GACCCGGAATCGCGGCCGCGTCGACCATTTTAGCCAAGGTGCCTCTATAGGGGTCA 
0 AGACATCATGTGCCCAGACGTAAGGTCAGGAATGTCATATTTTTCTGTTAAAATCA 
TTTTATTTCTGTGTATCTTACCTTTAAATCATTGTGGTTTACTCTGAGATTCTGTA 
GTCCTAATATTGTATCATTGTGCTGTCTGCAAAACAACTTGAATCTATTTTGTTTG 
CATCTTTTGTTACATGTAACGCAGCTGTACTTTATGTTCTTTGCAACTGTTTCCAT 
TATGAGAACGCTGTGCTATTTACAAGGTTACATTTTTCTTGGCCAGGCGAGGTGGT 
5 CATGCCTGTGATCCCAGCACTTTGGGAGGCCAAGGTGGGCGGATCACTTGAGGTAA 
AGAGTTGAGACCAGCCTGGCTAGCATGGCGAAGCCCAGTCTCTACTAAAAATACAA 
AAATTGGCCGGGTGAAATTAGCCGGGCGTGGTGGTGTGTGCTTGTAATCCCAGCTA 
CTCGGGAGGCTGAGGCAGGAGAATCGCTTGAATCCGGGAGGCAGAGGTTGCAGTGA 
GCCAAGATCANGCCACTGCACTCCACCTCGGGGTCAAGAGCGAAACTCTGTCTCAA 

0 

Sequence ID 3 95 

CCGTTTTAGTCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCTGCCTCGGCC 
TCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCGTAAATCAGGTTT 
TTTAAATGTTTGCCAAACCTTATCACTGACTTTTATAACAAAATTATTTACTATAA 
5 TCATTAGGGAATATTTAAGTTCTGCTAATACTTAAAATTGCAGAGTGCTAAAACCA 
GCAGTGAGTTTAGAATCAAGCTAAGCTTTATTGTTGCTACTATTTGAGGCATATTA 
GTtGACTGGTGTTCATATGCAAGGCAGTCTACTGGGTGCAACAAGGGTTAGAAGGA 
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TATTTTTAAAAAACTGACCCTATTCTCAGGATGAAAATAATACACTAGTAATAGTC 
TGCTCTGTTGGTTAACTCCTCGTAAGGAGGTCAATTAAAATGCTGTAGTGTTGCAA 
GGGAAGGAGAGGAAGAATCATATTCCTTCACTAGCAGGATCAAGAAAGCTTTTATA 
GAAATATACAAAATCTTCACTTCTTGAAGGATTGGTAAAATTTAATAGCCAACATT 
5 GGGCACTTATTCATTCTCTGAGTAAATATTTATTGCAT 

Sequence ID 3 96 

CTTAAATCTAAATGGACCACATTCTCTACTTAAAAAAATGCTATTAACCATGTGAT 
CTTCTCAGTCATGAGGTAATCTGGTGACTACCCTTCCTCAAAGCCAGTTGGGATAT 

1 0 TCTTTGAATAGAGTAAAACAGTGTTTCTAGGCTGGGAGACACCAGACATAGTTGAG 
GACAGAGGTGCTAGAAAATAGGAAGTTTAAAAGCATGTGCGGTGATGCTCAGAGGA 
GGTAAACCCCACCCTCATGCTCATAGCTTCCAATCATTTTCTCTAGTTCTTAACTC 
TTAAATGTGAGAAATGCTTGAAGATTACTAGTCATCTGAAGAAAGTCTCTTTATTA 
AAGATTTTCATAAAAGAGACCAAAGCAGACAAACAGAAAAAGACATCTTGGGGAAA 

15 AAAACAAGGATAATGGGAAGAGAAGGAAAGTTTTAAAAATTATCAATATCCTCAGG 
GGGACAAAATATTATATCCTATAAAGACAGATTTTTATTTTTTAAAAAAATAGAAA 
GCAAAACAAGCTCCTAAAAA 

Sequence ID - 397 nt : 534 

2 0 GACCCGGAATCGCGGCCGCGTCGACGGAAGCTCCTGCCCCTCCTAAAGCTGAAGCC 
AAAGCGAAGGCTTTAAAGGCCAAGAAGGCAGTGTTGAAAGGTGTCCACAGCCACAA 
AAAGAAGGAGATCCGCACGTCACCCACCTTCCGGCGGCCGAAGACACTGCGACTCC 
GGAGACAGCCCAAATATCCTCGGAAGAGCGCTCCCAGGAGAAACAAGCTTGACCAC 
TATGCTATCATCAAGTTTCCGCTGACCACTGAGTCTGCCATGAAGAAGATAGAAGA 

2 5 CAACAACACACTTGTGTTCATTGTGGATGTTAAAGCCAACAAGCACCAGATTAAAC 

AGGCTGTGAAGAAGCTGTATGACATTGATGTGGCCAAGGTCAACACCCTGATTCGG 
CCTGATGGAGAGAAGAAGGCATATGTTCGACTGGCTCCTGATTACGATGCTTTGGA 
TGTTGCCAACAAAATTGGGATCATTTAAA.CTGAGTCCAGCTGCCTAATTCTGAATA 
TATATATATATATATATCTTTTCACCATAA 

30 

Sequence ID - 3 98 nt: 512 

GGGGAGCCCCCTCTTCCCTCAGTTGTTCCTACTCAGACTGTTGCACTCTAAACCTA 
GGGAGGTTGAAGAATGAGACCCTTAGGTTTTAACACGAATCCTGACACCACCATCT 
ATAGGGTCCCAACTTGGTTATTGTAGGCAACCTTCCCTCTCTCCTTGGTGAAGAAC 

3 5 ATCCCAAGCCAGAAAGAAGTTAACTACAGTGTTTTCCTTTGCACCGATCCCCACCC 

CAATTCAATCCCGGAAGGGACTTACTTAGGAAACCCTTCTTTACTAGATATCCTGG 
CCCCCTGGGCTTGTGAACACCTCCTAGCCACATCACTACAGTACAGTGAGTGACCC 
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CAGCCTCCTGCCTACCCCAAGATGCCCCTCCCCACCCTGACCGTGCTAACTGTGTG 
TACATATATATTCTACATATATGTATATTAAAACTGCACTGCCATGTCTGCCCTTT 
TTTGTGGTGTCTAGCATTAACTTATTGTCTAGGCCAAAGCGGGGGTGGGAGGGGAA 
TGCCACAG 

Sequence ID 3 99 

TTTTGGCATTACTTAATCCAATTATAAAAACTGAATTTTTAAAAAACAGCACTTGT 
TTTTTCTTCCAAGATTAATTTGAATTTTTTTATGGACATTAGAAAACATTGCAGTT 
TAGTCATAATCAAAAATAAATCTTGAGGCTGGTAGAGCAGCTTTGTTGCTGTTTAT 
ATTTTTATTGCTTACTGGATTTCAGTGTTACCTAGTGCCATCAGTTTGGTATTTTG 

TTTAAATCCTGAATAGTTTGTGGCAGCTGGAGATGACCTAGTCCACCACTGTCCAA 
CATGGCAATGGTAAGTAATATTGAGTAAAGAATAGAAAATTAGTAAAATGCATGGC 
TTCAGAATTATAGC^TTTGCAAAATAGGTTAATGGATGAAAATTAGAATGACCAG 
TTTAACTTTCCCCCCAGCAGATTCTTCTGTTAAACAATGCCCCTTCAAAATAAAGG 
AAGAACAAGTGGGTGTTATACCTATGTTATTTGGCTATGTTAGCACAATATGATGG 
ACTAATTTGAGAAAAAGCATTTACTTCCTTTACTATTACTTCTTTTCTTTATAGGG 
CTAAGTCTGCCTTCTGGGTCTTTGAA 

Sequence ID 400 

GAAGAAGCGCGAAGAGCCGTTAGTCATGCCGGTGTGGTGGCGGCGGCGGAGACTGC 
GGGCCCGTAGCTGGGCTCTGCGAGGTGCAAGAAAGCCTTTGAGGTGAAGGTGTATG 
AAAGTCATCATAACAGATGTTTTCCAAAAACTTGTAGAAGGTTGTGAZy^AACTAC 
TAGGATCACGCGGCATGTATTGAGCATATAGGTTGCTGTAGATGAATGTTCTTAGC 
TGTCATGTTTAAAAATACTTCTGCTTCGTTACCTCAAGTGTGGCATGCAGCATTTT 
GGAAGGAAAATTGAAGACGTGTTCAAGAAAAGATGAACAGAAGCAAATGATGAAAA 
TGAGCATTTTACTTGATGTTGATAACATCACAATAAATTATGGAGAAAAATACATA 
TTTGGCTAACTTTTAATTGCTGAACAATAAAGTGTTTTCTTTTAAATCNAAAAA 

Sequence ID 401 

GAAGCCAAACCAAAGGGAGCTTCTACTTCATGATGCCATTTATGTAAAGTTCAGGC 
AGAGAAAATCAGTGGTTTAAGAAGTTAGAATAATGATTATCTTTGGAGGGATTGCA 
ACTGGAAGAAGTCATGATTGGGATTTCTGGGTCCTAATAGTGCTCTGTGTCTTGAT 
CTGAGTGCCGACTACATGAGTGGTTAGGTTTGCAAAATTCATTGAGTTATGCACTT 
AATGGTGTTGTCTTATTAGAGCTGATGGAGGAGAGAGGGCTTCAATTTGCACAACT 
GAGTAATCAGCTAGGCCCAGTCACTAGGTGAACAACTTACTGCTACCAATCAGCCT 
TAGAGC^GGAATCAAACTCATGTCTCAGAAAAGTTATTAATTCAGCTTGTCTTGG^ 
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ACTTCCTTCAGAGTCACTCTTGAATAGCTGAAATAGTAAATGTTAAATCTGTGGAT 
GCAAGTGTGTAAATTATTTTAGTCATCAGCTCTAATAAGATGGCCTTTGGGGAAAT 
GAGTATAAGGTCACGAAAATGAAATGGCAAGAAGGAGGTCTACTATTTCTTCTGTA 
ATACTGATTTTTACCCCATCAGGGTCAGTCCCCAAAGGTTGTAAATGTGAAGCTTG 
5 GTCTTTTTCTTTA 

Sequence ID 402 

GACCCTATTCTCAGGATGAAAATAATACACTAGTAATAGTCTGCTCTGTTGGTTAA 
CTCCTCGTAAGQAGGTACAATTAAAATGCTGTAGTGTTGCAAGGGAAGGAGAGGAA 
1 0 GAATCATATTCCTTCACTAGCAGGATCAAGAAAGCTTTTATAGAAATATACAAAAT 
CTTCACTTCTTGAAGGATTGGTAAAATTTAATAGCCAACATTGGGCACTTATTCAT 
TCTCTGAGTAAATATTTATTGCATGCTTATCTTGTATCAACATTGNGATGAAAGCN 
CAAGAATGAAAGAGGAGGGAGAATGTTTANAGAATAAGGCTGAAACACAG 
TAGGGAGCGTAGGGGAGACTGANAAAACAG 

15 

Sequence ID 403 

AAGACACCTGATAGATTGTCTTGTATTATTTTTCCTTTGCCTTCTTACAATCTCAG 
TGATTAGAATTGGGCTGAAAACAATACATCAAATTCTCAGCAJ^ 
TGCTGGATACCGAGGGTTTTTAAGATCTTTAGACTTCACTATATAGAACAAATGTT 
2 0 GAATGGGAATTTTCTTTATTTCTATANCGTTTNG 

Sequence ID 4 05 

CCCGGAATCGCGGCCGCGTCGACGATGAGCATTTTTTCATGTGTCTTTTGGCTGCA 
TAAATGTCTTCTTTTGAGAAGTGTCGGTTCATATCCTTTGCCCACTTTTTGATGGG 

2 5 GTTGTTTTTTTCTTGTAAATTTGTTTGAGTTCATTGTAGATTCTGGATATTAGCCC 

TTTGTCAGATGAGTAGGTTGCGAAAATTTTCTCCCATTTTGTAGGTTGCCTGTTCA 
CTCTGATGGTAGTTTCATTTGCTGTGCAGAAGCTCTTTAGTTTAATTAGATCCCAT 
TTGTCAATTTTGGCTTTTGTTGCCATTGCTTTTGGTGTTTTAGACTTGAAGTCCTT 
GCCCATGCCTATGTCCTGAATGGTAATGCCTAGGTTTTCTTCTAGGGTTTTGATGG 

3 0 TTTTAGGTCTAACGTTTCAGTCTTTAATCCATCTTTTAAAAGTCTCTTCACAGTAC 

ATGAGTAGTAGTGACACCAATAATGTCAGAGCAGGGAACTCCCAGGTTCTGCCCAT 
CCACAAAAACAACAAATAAGCTGGCAAAAACT 

CTGAAATCTAGTCAAAACTTAAACAGAGGAAAGATTAATAAAGACNGGCTGCCTGA 
GATAACACTAACACACAC 

35 

Sequence ID 406 

CATCAAATAAATAAATAAATAAATTTTAAAAGT 
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TTGGGATGATAAAGCACCTGCTTATCATGAAGCTANAGAAATTCAATGACACGTTT 
GCCAGGGTCTTTGCTAGTGATGTTGGAACAAGTCTGTAATGCTGATGAAACATCAC 
TGTTCGGGCATTATTGCCCCAGAAAGACACTGACTGCAGCTGATGAAACAGCCCTT 
CCAAGASlTTAAGGATGCCAAAGACCAAATAACTGTGCTGAGATATACTTACGCAGC 
AGGCATGCATAAGTGTAAACTTGCTGTTATAAGCl^AAAGCTTGCGTTCTCACTGTT 
TTCAAGGAGTGAATTTCATACCAATCCATTATTATGCTAATAAAAAGGCATGGATC 
ACCAGGGACATCTTTTCAGATTGGTTTCACAAACATTTTGTACCAGCAGCTTGTGC 
TTACTGCAGGGAAGCTGACTGGATGATGACTGCAAGATTTTGTTATATCTTAACAA 
CTGTTGTGCTCATCCTCCAGCTGAAATTCTCATCAAAA^TAATGTTTATGGCTCAC 
ACCTGTAATCTCAACACTTTGGGAGGATTGCCTGACCCAGGAGTTCAAGCCCACCC 
TGGGCAACACAGCAAGACCCAACCTNTC 

Sequence ID 407 

TTTTAAAAATCATAAAACGTTTCTTACAAAAGAGCATTACATTNTGCACACTGCTC 

TGAACAGATGCCAGGGACATGTGGACTATTGTTACTTTTCCTCCCTGTCCCACCCC 

CCAAATGTTACAGTGACCACAAAGCAAGGTGTTCACAATAATTACATGGGGGGAAT 

TTTTTAAACCACCAACAATAACGAAAAATAAAATCCACTCACTCTGCTGCTGTTTC 

AAAATTTCAATGTTAGTTTTTGCACGCCCTTCCCCCCCCCAACCCTGTTTGTAAGG 

AACTAAAACATTACATCTGGTGAACAGCAAAGATTTCACTACACCTCAAATGCAGA 

ACACCTATGAAGCAGAGGAATGTTGGCTTTTTAAACAGAAGCAGATAAAAAAAAAA 

GATGCAGGACTCCTTCAGTTCTTCACTAGTCTTAGAAAAACTTTCCAGAATACTGC 

TTCACACTATAAAAAAGAAAAAATATCTTGCATTAGAATCCTTCAACATCTGCATA 

CTGCTTCACACTGTTCGTTTCTAGGAGCACTTTGTCACAGGACACTTCTGCTTATA 

TTTCTTTAATCAGAACTTAGTTGGATGGGCCGGGCATGGTGGCTCACGCCTGTAAT 

CCCAGCACTTTGGGAGGCCGAGG-GGGTGGATCACC 

Sequence ID 408 

CCATCTCCAAATTTAGTATTCATTCTGTTTAGCATATTATCAGTTGCCATCTATTT 
GTTTTAACTGATTACTTGAATCTGATTAAACATCACAGAAATGGGCTTTGATAAGA 
ACAATATTGAATAAGAAATTTTAAATAACAAAACAGCTTATAGAAAAATTCAGCAT 
AACTTTTCCATCACCTTCACCACCCTTGCCTTTTATTATCCTGTCCTGTATCACTG 
CTTTCTGTTAGCAGTGTTGTGTGAGTTAGGATTTGGGCAGGAAAGCAAAAGCAACC 
ACCCGTCATTTTCCCAGAATGAAGGGTTTGACGTAGGATGTAGACTTTGTATAGTA 
GTTGGGAGAGCTGTGGGAGTGAAGGTCAGGGATGTCACCTACAGAAGTCAGGGAAT 
CTGCCACCAGAGATCCTGCATCAGAAACAGCCAACAGCGTGCTTCTGAAGAACTAG 
TGGGGAAGTGGCTATAATTCTTAGGAATCCCAGCAAGTCCGCACCACTGTCTCAGT 
CTACAGCAGTGGAGAAAGGGGTTTCCAGGAGCTCTCTGGAAAGTTCCTGCCCACAC 
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TTTGClAACAATCTTCAGAGGATAATGGGCTTCTCTTCCAGCTTCCACACCCAACAA 
GAGTGCCTTTCATCGGCCAACTCTAACCTGGAACCCTATGGCAGAGGGGATTTAGG 
AGACAGTTTGTNATGTCTGTGGAATGCAAATGAANANGTANCAATGCTTANTTGAC 
AGCGGNCATACACAAATNTNGAAA 

Sequence ID 409 
GAT CCGTNG ACT 

Sequence ID 410 

CTCTTCCCAGCCCCTGAGCCCAGCCCCTTCCCAAGTGGTGCCAGACAAAAAACTAC 
ATGGCCCTTTCGTGTCTTGGGGGTGGAAAGGGAGGGATGAATTGGGGTGATAGAAC 
CCTGGTGAATTCAGAGTAATCTTTCTTTAGAAAACTGGTGTTTTCTAAAGAAACAG 
GATAGGAGTTTAGAGAAGGCACCAAAGCTTTGACTTTGGTTTGGCACCAGTTTCTA 
ACCATCTGTTTTTTCTACCCTAGCTATCTTTTATTGGTAAAATATAAATGTATAAT 
TATGTTTGTAGAGCTTTACCAAGGAGTTTCCCTCCTTTTTTGTTTGTTGATTAGCA 
AATTTTTGATTCTCCATTTTCCAAAAGTAAGAGACTCCAGCATGGCCTTCTGTTTG 
CCCCGCAGTAAAGTAACTTCGATATAAAATGGTATTTGAAAGTGAGAGTTCATGAC 
AACAGACCGTTTTCCATTTCATCTGTATTTTATCTCCGTGACTCCACTTGTGGGTT 

T 

Sequence ID - 411 nt : 505 

TGGAGCTGAAAAATTCCTATTACCTAGGGGCATCACAACGCATTGCATTTCGCCCG 

TGTTTGGGATGATGCTGGTGTAAACCTACTATGCTGCCAGTCATGTAAAAGTATAG 

CACACACAATTAGTAGGTAATGCTTGCAAATAATAATGAAAGACTCTGCTACTGGT 

TTATGTATTTACTATGCTATACTTTTTGTCATTACTTTAGAGTGTACTCCTACTTT 

TTTTTTTTTTTTTTTTGAGATGGAGTTTCACTCTTGTCCTGTAGGCTGGAGCGAAN 

TGGCGCGATCTCGGCTTACTGCAACCTCCACCTCCTGGGTTCAAGCGATTCTCCTG 

CCTCANCTTCCCAGAGTAGCTGAGATTACAGGCATGCACCGCCACGCACGGGTAAT 

TTTGTATTTTTGGTAGAGACAGGGTTTCACCATGTTGGCCAGGCTGGTCACCAACT 

CCTGACCTCAGGTGACCCGCCTCCTCACCTCCAGAGTGTTGGGATTACAGGNGTGA 

G 

Sequence ID 412 

ATAAAAATTAGCTGGGGGTGATGGGCCCTGTACCCCAGCTACTCGGGAGGTGAGGT 
AGGAGAATCACTTGAACCCGGGAGATGGAGGTTGCAGTGAGCCAAGATCGTGCCAC 
TGCACTCCAGCCTGTGTGACAGAACAAGACTCTGTCTCAAAAAAAAATAATAATAA 
TAATAATAATAAAAAGGAATAACATAGCTAGGAATAAATTTAATCAAAGAGGTGAA 
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AGACTTATACACTTAAAACTACAAAAAAAAAATCACTGAAGGAATTATAGACCCAA 
ATAAAAATAAATAAAAAGACATTCTGTGTTTTAGGGAAAGAAGACTTAATATTGTT 
AAGATGTCAATACTACCCAAAGTGATCTACAGATTCAACATAATGCCTATCAAAAT 
TCCAACAGCCTACTTTGTAGAAATGGAAAAGCCAATTTTCAAATTCAGATGGAATT 
GCGAGGGGTTCTGAATAACAAAACACAATCTTGGGGAAAAAAAACAAAAAACAAAG 
TCAAAGAACTCACACTTCTCTATTTATAATTTACTACAAAGTTATAGNATCAAAGT 
CGACGCGCCGCGATCCGGGC 

Sequence ID 413 

CACAGl'ACTCCATTTTGGGGTCCAAACTGTAATGCTCAAAATAATAAATGCTTACA 
CGAAAATTATTTATTGAGAATATTCATATAAAAATTACCTAAAGCAAAGTAAAAAA 
AGTAAAATCAAGGTGGTATATTTGAAGTGAATGGTGATTGGAAATTTTTAGCTGTA 
ACAAAAAGAAAGAAAACAACTTTTTTTAAAGCCTCATTCTCTTTTCTTTCAAAATG 
TACCTTATTCCCACACACTCTTGGGCTGACCTTTATTTTATCAATAAGCTCAATAT 
TACTTTGTTTAAAATAAGATGCTTCAGCAAAAGTCATTCTCTCTTTAACCATATAA 
TTTAAAAACTCCTCTTCACGATTGATAGCAAAATCAGAAACGTTAGGGCACCAGTG 
AGTTGAAAAAACTGGTCTTAAGTTGGAAAAACTATTATTAATAATATTATCCTATC 
CATCCATATCTATTGAAATTGTGAGGTCCATAATTTCATTTTAATTAATTATAGGA 
AAGAAGAAAAGATAATACCCATTTGTTCTAT 



Sequence ID 414 

CTCAGACTCTTTCTGCCCTAATGGCCATTACTATCCAGTCTGTATTGCTACAAGGG 
ACCCACTGGTACCCCTTTTAGATTCTATCAAAAGGAACAGGGTTTTCCTAGAGGCA 
GGCAGCCTGGTGGTATGGCACAGCAGAAGCTTACTGCTAATGAAATGGGAACCTCC 
CCCTCCCTTGTGGTTTCAGCACAGAACCTGAATGCCAGGAAAAATTCCTGGGCCAA 
GAAGCTAAAGCTAAAGAAACCTTCCTTTTTTCAACGTTTTTTTTTCTTTCAAACTG 
TAGGGTCACTTTTGATTGAGGCAAAGGGGTCCTACTGTAAGTGGAAAAGACTCACT 
CCCCTAACATAAGTTTTCACTGTGGTGGGATGGTGCCGCCCGATATGCTTGATATG 
CTTTTCCTTCCACATGTTAAGCTAGGAAACCTAACAGGATGTCAGCAGGGCAGTTA 
ACTCTGGACTCANAGCCCTCAAGGGCATGTGGCANAACCTCATGGCATNCAAGACC 

A 



Sequence ID - 415 nt: 596 

GTATAATTGATTCTTTTGAACCTAAAGTATAAGACTTCACGATTAGAAAAAAATTA 

TCCAAAGACTAATGTAATTAAGTGAGGAAAAGGTGCTGGAGGAACTGGATAACCAC 

ATGGAAATGTATGAACCATGACCTCTATGTCACATACTATATATAAAACTTAATTT 

GAGGTGTATCACAGAGCTAACTGTGGGGGCTAAAACGTTGAAGCCTTTGGATGGCC 
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GCACAAGAGATGTCTGCATTCATAACCTTGGGGAGGGTATGAACATTTCTTGGTAA 
CATGCAAAAAGCACTAACTGTAAAAGAGAACAGTTGGTCAGTTGAATTTCATGAAA 
CATTGTAAACTTCTGCTAAACAACTGACACCATTAAGAATGTGGAAAAAGGCTGGG 
CACAGTGGCTCATGCCTATAATCCCAGCATTTTGGGAGGCCGGGGCGGGAGAATCA 
5 CTTGAGGCCAGGAGTTTGAAACCAGCCTGGGCAACATGGCAAGACCCCGACTCTAC 
AAAAATATTTTTAAAAATTAGTTGGGTGTGGTGATGCACTCCTGTAGTCCTAGCTG 
CCAGGANGCTAAGGNGGAAGGATCACTTAACCCTGG 

Sequence ID 416 

1 0 CTGGTGGCGGCGGTCGTGCGGACGCAAACATGCAGATCTTTGTGAAGACCCTCACT 
GGCAAAACCATCACCCTTGAGGTCGAGCCC 

CAAAATTCAAGACAAGGAGGGTATCCCACCTGACCAGCAGCGTCTGATATTTGCCG 
GCAAACAGCTGGAGGATGGCCGC^CTCTC 

ACCCTGCACCTGGTGTTGCGCCTGCGAGGTGGCATTATTGAGCCTTCTCTCCGCCA 
15 GCTTGCCCAGAAATACAACTGCGACAAGATGATCTGCCGCAAGTGCTATGCTCGCC 
TTCACCCTCGTGCTGTCAACTGCCGCAAGAAGAAGTGTGGTCACACCAACAACC^ 
CGTCCCAAGAAGAAGGTCAAATAAGGTTGTTCTTTCCTTGAAGGGCAGCCTCCTGC 
CCAGGCCCCGTGGCCCTGGAGCCTCAATAAAGTGTCCCTTTCATTGACTGGAGCAG 

20 Sequence ID 417 

GCAGGGGCTTCTGCTGAGGGGGCAGGCGGAGCTTGAGGAAACCCGCAGATAAGTTT 
TTTTCTCTTTGAAAGATAGAGATTAATACAACTACTTAAAAAATATAGTCAATAGG 
TTACTAAGATATTGCTTAGCGTTAAGTTTTTAACGTAATTTTAATAGCTTAAGATT 
TTAAGAGAAAATATGAAGACTTAGAAGAGTAGCATGAGGAAGGAAAAGATAAAAGG 

2 5 TTTCTAAAACATGACGGAGGTTGAGATGAAGCTTCTTCATGGAGTAAAAAATGTAT 

TTAAAAGAAAATTGAGAGAAAGGACTACAGAGCCCCGAATTAATACCAATAGAAGG 
GCAATGCTTTTAGATTAAAATGAAGGTGACTTAAACAGCTTAAAGTTTAGTTTAAA 
AGTTGTAGGTGATTAAAATAATTTGAAGGCGATCTTTTAAAAAGAGATTAAACCGA 
AGGTGATTAAAAGACCTTGAAATCCATGACGCAGGGAGAATTGCGTCATTTAAAGC 

3 0 CTAGTTAACGCATTTACTAAACGCAGACCAAAATGGAAAGATTAATTGGGAGTGGT 

AGGA 

Sequence ID 418 

CCCGGAATCGCGGCCGCGTCGACGGGAGGTGATAGCATTGCTTTCGTGTAAATTAT 
3 5 GTAATGCAAAATTTTTTTAATCTTCGCCTTAATACTTTTTTATTTTGTTTTATTTT 
GAATGATGAGCCTTCGTGCCCCCCCTTCCCCCTTTTTTGTCCCCCAACTTGAGATG 
TATGAAGGCTTTTGGTCTCCCTGGGAGTGGGTGGAGGCAGCCAGGGCTTACCTGTA 
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CACTGACTTGAGACCAGTTGAATAAAAGTGCACACCTTATAAAAAA 
Sequence ID 419 

CCCGGAATCGCGGCCGCGTCGACGGGAGGTGATAGCATTGCTTTCGTGTAAATTAT 

GAATGATGAGCCTTCGTGCCCCCCCTTCCCCCTTTTTTGTCCCCCAACTTGAGATG 
TATGAAGGCTTTTGGTCTCCCTGGGAGTGGGTGGAGGCAGCCAGGGCTTACCTGTA 
CACTGACTTGAGACCAGTTGAATAiyy^GTGCACACCTTATAAAA 

10 Sequence ID 420 

CTTCATTTGAAATGGTTGAATCTGCTGTGTAATAAAGTGGTTCAACCATGATTAGG 
AACTGAAATTTAGTAGAAGAGGGAAAAGGAGTTAATGTAACAAATTATTTTAGCTA 
CAAACCCCGGTAATAGAGCACTTCSK^GGATGGGATGGGGTGGGTTGGTGAGACAAT 
CAGAATGGTAAATTGATTAAATGCTCCTAACCCTGTAATTTTGTGCATAGAGCACC 

1 5 CTATGCTGTGGAAATAACTGTTCTT AGATTTCATTGT AACTGGACTGTTC AGGTTG 
CCCAGAGGGAAAGAACATTCCTAATTCTAATAAAATAAACTTTTATTTTGTTTA 

Sequence ID 421 

TGTCATTGAATCTGCTTGTTACTTAAATGCTAAACTCAATTCTGTAATTCAATAGG 
2 0 TGCACCTCTCTGAGAAACATAAGAGACAATGAGGAAAAGGATTCAGCATTCCGTGG 
AATTTGTACCATGATCAGTGTGAATCCCAGTGGCGTAATCCAAGTAAGATGTTCAC 
AAAGATTTGTTTTTAATGTCTAATTAATAAAATTTTAAAGGAAGAAACATTCTAAT 
ACTTTAATTATAAAAAGTTAACTATTTTCAAAGGTATCAAAATACAGTTAAACCTT 
TAAAATGTATATTTCTTAATATCTTGAAATTGTAATGCCTTTTTTTTTTCCTAAAT 

2 5 TTTTTTTGTCATGAAATGAGATAGTAACAGCAGATTGGGACAACAAGGTTATATTC 

TTGTCTTGAATC21GGCCATGGCTTCT 

CTTTGTCCCTGCCTCCCATCCCTGGATATCAGTTTGTGGATATCTACAGTTAATAG 
AGTGACCAAATAGTAGGAATACTGTCTCTCTATTCTGAATAAAATCTTTGAATCAG 
ATTTAGAAATAATGAATAAAATACAAATCAGCCATTGAAATTGCTCTAATTTTGAG 

3 0 AGCTTATGATTTATTCATCTTTGGTTTCCAAGTTCAAGTTATATGTAGACATTTTA 

ATT 

Sequence ID 422 

GCTTCCTAGGTGAGGTCACGAGGAAACCTGCTGGCCAAGTGACCTGGCAGGGTGTG 
35 GCCAGTGTGGCCAGGGCCGCCGAGCCTGCTTTCCTTCCCTGCAGCAGGAACCCTTC 
TGGGGCTGTGATCCTGCGATGGTGCCTGGGTGGGAGTGGGGGTGGGGGGCGGGATG 
GTCTCCCTACCTGCCAGCTTCTTGGTTTGAGGTGAGGACAGCCCCGGAAGCTCANA 
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CTTGGCTCCTGTCCATGTACTTGGGGCCATGAGCTCTGCAGGGACCTTGGAAAGAN 

AGAGACGGGTGGTGTANGK3CANGGGAAGGCATT 

NAATGGAAACAGGCGAAACTTACCAAGT 

GGGAAGGTTTTAATTATTTTAAAAATAGAGATGGGGTCTCACTATGTTGCCCAGGC 
5 TGGTCTCAAACTACTGGGCTCAAGTGAACCTCCTTCT 

Sequence ID - 423 nt: 3 87 

TGTTTCTCNAGGGCGAGAGGCTGTCTTANAGCACCATTCTCTGGCCCTNGTCCCAT 
GAGAAGGAACCGC^CTCAGGAGCCACACTCT 
1 0 ACAGAGGGCACGGAGCTGGCTGTGGTGAGAGGAGGTCCANCAAATTCCTGTCTGC^ 
NAAGGGTTCTGAACACCACCGCCTGGCAGCGTGCTGGAGGAGGGATTCCTCTTTTC 
CTCACAGCAATTCTGACCAGAAACCTGTCAAATCAGGAATGGCTAAAAT 
GGGTATGAATGACCATCAGCCACAGTAA 

CCAAGCTGCTGTGGCCCAGACTGGTGACATCACCTCAGGGCAAAAAAAAAA 

15 

Sequence ID - 424 nt : 420 

CGCAGAATGGCTCCCGCAAAGAAGGGTGGCGAGAAGAAAAAGGGCCGTTCTGCCAT 

CAACGAAGTGGTAACCCGAGAATACACCAT 

TGGGCTTCAAGAAGCGTGCACCTCGGGCACTCAAAGAGATTCGGAAATTTGCCATG 

2 0 AAGGAGATGGGAACTCC^GATGTGCGCATTGACACCAGGCTCAAC^^AGCTGTCTG 

GGCCAAAGGAATAAGGAATGTGCCATACCGAATCCGTGTGCGGCTGTCCAGAAAAC 
GTAATGAGGATGAAGATTCACCAAATAAGCTATATACTTTGGTTACCTATGTACCT 
GTTACCACTTTCAAAAATCTACAGACAGTCAATGTGGATGAGAACTAATCGCTGAT 
CGTCAGATCAAATAAAGTTATAAAATTG 

25 

Sequence ID 425 

GGAAACTGATGCCAGTCAGAAACTCAGATCAAATGAAGGGGTGAAGAGAACCAGAA 
TTGATCTCTCTGTAGGAGAATATAAATGACTTTTTTAAAGTACATATTTTCTGTGA 
AAGACAGTTTTTTGTTTAATGCAAAAATGTTAAGAATGTTTATATCATGTAGAAGT 

3 0 AAAAGATCGTGAAACAGCACAGAGAACAGTAGTAAGACAGATTGAATTGCACTGTT 

GTAAGATGATGAACTTACAATATTAAGTGAAGGTAGACTGTGATAGATTAAGGATA 
TATATTGTAATCCCTAGAGCAATTGTCAAAGTGGTACAGGTAAAAAGCCAATAGAG 
GTGATAAAATGGAATACTAAAAAATATCAGATGAATAATAAAGAAGACAGGAAATG 

AGGAACAGTGGAACAGAATGAATAAAAAAC^^ 
3 5 TTACTTTAAATGGGTTAAACATTATGGTTATAAGGCAGAGATTTTCAGACTAGATA 
AAAGAGCAAGCTCCACTATATACTGTCTACAAGAGATATACTTTAAAGTGTATATT 
ATATTTAAATATAAAGATTTGGAATAAATAAACCTAAGAATAAGCTTACTAGGGAA 
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GTGAAAGATCTGTACAACAAGAA.TTACAAAACACTGCTGAACGAAATCATAGGTGA 
CCA 

Sequence ID 426 

5 GTCCCGGAATCGCGGCCGCGTCGACGTTTCCTCAAAATTTATCTTCCTGTTAATGT 
CAGGCATGTATCTCCTTAGCTTGCCACAAATAACTATATATACCACAGACCTTCCT 
TTGTAGGGCTAACAGTGTTGCATTGTAAGTGGAGGCCTCATAGATACCTGGCCTTT 
TCCTACCTTATTCO^AAGATGGTTGCATCTTATAAATAATGTCATTCTTCAGCAAA 
TGGTATGGAAATGAGATTGTAATGTCATTATTTCCTCTTTAAATAATCAGGACAAC 

1 0 TCATGATAC^AAGAGCTCTTCTCTATAAAAGGTGGGACTTTTTTTTTTAGTAATAG 
CAAAAATAAAATTGTACCTCCTTAATCTTCTACAGAAAGATGGATTTCATTTTCAA 
CATTAAGAGGTAGTTTTAAGAAGCAGTAGAAGTCAGCCTGGGCAGCATGGTGAAAC 
CCCGTCTCTACAAAAAAGTTAGCTGGGCTTAGTAGTTGCAATCCCAGCTACTCTGG 
AGGCTGAGGTTGGAGATCATCTGANCCTGGGGAGGTCNAGGCTGCAATGATACANT 

1 5 GAGCCCTGATTGTGCCACTCCACCTGGTTGCAGA 

Sequence ID 427 

TTCCAATCTTCGTGTTCACTTTAAGAACACTCGTGAAACTGCTCAGGCCATCAAGG 
GTATGCATATACGAAAAGCCACGAAGTATCTGAAAGATGTCACTTTACAGAAACAG 

2 0 TGTGTACCATTCCGACGTTACAATGGTGGAGTTGGCAGGTGTGCGCAGGCCAAGCA 
ATGGGGCTGGACACAAGGTCGGTGGCCCAAAAAGAGTGCTGAATTTTTGCTGCACA 
TGCTTAAAAACGCAGAGAGTAATGCTGAACTTAAGGGTTTAGATGTAGATTCTCTG 
GTCATTGAGCATATCCAAGTGAACAAAGCACCTAAGATGCGCCGCCGGACCTACAG 
AGCTCATGGTCGGATTAACCCATACATGAGCTCTCCCTGCCACATTGAGATGATCC 

2 5 TTACGGAAAAGGAACAGATTGTTCCTAAACCAGAAGAGGAGGTTGCCCAGAAGAAA 
AAGATATCCC^GAAGAAACTGAAGAAACAAAAACTTATGGCACGGGAGTAAATTCA 
GCATTAAAATAAATGTAATTAAAAGG 



3 0 Sequence ID 428 

TGCAGGATCCGTCGACTCTAGATAACATGGCTAGAAAAGAGAATGAAAAAGTTGGA 
ATTTTTAATTGCCATGGTATGGGGGGTAATCAGGTTTTCTCTTATACTGCCAACAA 
AGAAATTAGAACAGATGACCTTTGCTTGGATGTTTCCAAACTTAATGGCCCAGTTA 
C^TGCTCAAATGCCACCACCTAAAAGGCAACCAACTCTGGGAGTATGACCCAGTG 

3 5 AAATTAACCCTGCAGCATGTGAACAGTAATCAGTGCCTGGATAAAGCCACAGAAGA 
GGATAGCCAGGTGCCCAGCATTAGAGACTGCAATGGAAGTCGGTCCCAGCAGTGGC 
TTCTTCGAAACGTCACCCTGCCAGAAATATTCTGAGACCAAATTT 
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Sequence ID - 429 nt : 535 

CACAGTACTCCATTTTGGGGTCCAAACTGTAATGCTCAAAATAATAAATGCTTACA 
CGAAAATTATTTATTGAGAATATTCATATAAAAATTACCTAAAGCAAAGTAAAAAA 
AGTAAAATCAAGGTGGTATATTTGAAGTGAATGGTGATTGGAAATTTTTAGCTGTA 

TACCTTATTCCCACACACTCTTGGGCTGACCTTTATTTTATCAATAAGCTCAATAT 
TACTTTGTTTAAAATAAGATGCTTCAGC^AAAGTCATTCTCTCTTTAACCATATAA 
TTTAAAAACTCCTCTTCACGATTGATAGCAAAATCAGAAACGTTAGGGCACCAGTG 
AGTTGAAAAAACTGGTCTTAAGTTGGAAAAACTATTATTAATAATATTATCCTATC 
1 0 CATCCATATCTATTGAAATTGTCAGGTCCATAATTTCATTTTAATTAATTATAGGA 
AAGAAGAAAAGATAATACCCATTTGTTCTAT 

Sequence ID 430 * •• * 

CAGGGGCTTCTGCTGAGGGGGCAGGCGGAGCTTGAGGAAACCGCAGATAAGTTTTT 
1 5 TTCTCTTTGAAAGATAGAGATTAATACAACTACTTAAAAAATATAGTCAATAGGTT 
ACTAAGATATTGCTTAGCGTTAAGTTTTTAACGTAATTTTAATAGCTTAAGATTTT 
AAGAGAAAATATGAAGACTTAGAAGAGTAGCATGAGGAAGGAAAAGATAAAAGGTT 
TCTAAAACATGACGGAGGTTGAGATGAAGCTTCTTCATGGAGTAAAAAATGTATTT 
AAAAGAAAATTGAGAGAAAGGACTACAGAGCCCCGAATTAATACCAATAGAAGGGC 

2 0 AATGCTTTTAGATTAAAATGAAGGTGACTTAAACAGCTTAAAGTTTAGTTTAAAAG 

TTGTAGGTGATTAAAATAATTTGAAGGCGATCTTTTAAAAAGAGATTAAACCGAAG 
GTGATTAAAAGACCTTGAAATCCATGACGCAGGGAGAATTGCGTCATTTAAAGCCT 
AGTTAACGCATTTACTAAACGCAGACGAAAATGGAAAGATTAATTGGGAGTGGTAG 
GATGAAACAATTTGGAGAAGATAGAAGTTTGAAGTGGAAAACTGGAAGACAGAAGT 
25 ACC 

Sequence ID 431 

CGCTGGGTGCCTGCAGCGCCTCCCTTGTCTCATATGGTGTGTCCAGCACTCTATTG 
TTGTAAACTGTTGNTTTGNCTGACCTAAATTNTCTTTACTAAACANATTTAATAGT 
30 TNAAAAAAAAAAAANANCA 

Sequence ID 432 

TTTTAAAGTCATCTCTATAGGAAGGTGCTGGGCAGGGATCCCAGAGAAAGAAAGGG 
TCCAAGACTCCATTAACTGCCCTGGATGAAGGGCACTGCTACAGCAGCTAGTACCA 

3 5 GAGACTCTCCTATCTCACGGTTGAGGCAGACCCAGGATAGAATAGAGAATAAAAGG 

AATGCTTATAGGAAACAATTTTGTATGGAATGCTAGATGGCCAAGCCTCAGCCTTT 
GGTCCAGTGCAACCCTTGCCTCGCTTGTCAACAGTGAAAAATTAGTTTGG'I'TAGAA 
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GAACCATCTGGAAACACACCAGCTTCTGCTACCTT(^TGCT<^TTGTTAAAAAAAG 
ATTAACCAGTGTGAACATTCTGATCTGTTAATTCCAGGGACTGTTTTCTTTCCAAT 
GGACTGTTTGTTGGTAGAATAACCCCCAAAAGCTCAAAGCTAAAATGCATCATCAG 
TCCTAGTCGGCAGTTCCTTAAGAATGGACTGGCGGCGTGGTTGAGCTGATATGGAA 
5 AAGCTGGACCTTCCTGCAGAAGATC^ 

CCTGAGGTATATTTCAATGAAGGCAGGTAGCTGTGCTTCTCAGAGCA 

Sequence ID 433 

TCCCGGAATCGCGGCCGCGTCGACCCGCCGCCGAGGATTCAGCAGCCTCCCCCTTG 
10 AGCCCCCTCGCTTCCCGACGTTCCGTTCCCCCCTGCCCGCCTTCTCCCGCCACCGC 
CGCCGCCGCCTTCCGCAGGCCGTTTCCACCGAGGAAAAGGAATCGTATCGTATGTC 
CGCTATCCAGAACCTCCACTCTTTCGACCCCTTTGCTGATGCAAGTAAGGGTGATG 
ACCTGCTTCCTGCTGGCACTGAGGATTATATCCATATAAGAATTCAACAGAGAAAC 
GGCAGGAAGACCCTTACTACTGTCCAAGGGATCGCTGATGATTACGATAAAAAGAA 
1 5 ACTAGTGAAGGCGTTTAAGAAAAAGTTTGCCTGCAATGGTACTGTAATTGAGCATC 
CGGAATATGGAGAAGTAATTCAGCTACAGGGTGACCAACGCAAGAACATATGCCAG 
TTCCTCGTAGAGATTGGACTGGCTAAGGACGATCAGCTGAAGGTTCATGGGTTTTA 
AGTGCTTGTGGCTCACTGAAGCTTAAGTGAGGATTTCCTTGCAATGAGTAGAATTT 
CCCTTCCTCCCTTGTCACAGGTTT^ 
2 0 TGGGGTCCGCTTTTAACTTGGACTAGTGTAACTNCTTCATGCAATAAACTGAAAAG 
ACCATGCTGCTANTC 

Sequence ID 434 

TTCGGACGCAAGAAGACAGCGACAGCTGTGGCGCACTGCAAACGCGGCAATGGTCT 

2 5 CATCAAGGTGAACGGGCGGCCCCTGGAGATGATTGAGCCGCGCACGCTACAGTACA 

AGCTGCTGGAGCCAGTTCTGCTTCTCGGCAAGGAGCGATTTGCTGGTGTAGACATC 
CGTGTCCGTGTAAAGGGTGGTGGTCACGTGGCCCANATTTATGCTATCCGTCAGTC 
CATCTCCAAAGCCCTGGTGGCCTATTACCANAAATATGTGGATGAGGCTTCCAAGA 
AGGAGATCAAAGACATCCTCATCCAGTATGACCGGACCCTGCTGGTAGCTGACCCT 

3 0 CGTCGCTGCGAGTCCAAAAAGTTTGGAGGCCCTGGTGCCCGCGCTCGCTACCAGAA 

ATCCTACCGATAAGCCCATCGTGACTCAAAACTCACTTGTATAATAAACAGTTTTT 
GAGGGATTTTAAAA 

Sequence ID 435 

3 5 CTGCAATGTGCAATAGTTGCACCACTGCACTCCAGCCTGGGTGACAGAGTGAGAAC 
CTATCTCTTAAAAAAAAAAAAAAAAAAAGGAAGAAGAGACATGAGAGGGCCCAAGT 
* CACTTGCTC^CTC^CTTTCCGTGTAC^TGTACC^GAAAAGGCCATGTGGGAAAGA 
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GCAAGAAGGCAGCCGCCTTCAAGACAGGAAGAGAGCCCTCACCAGAAACTGAGCCA 
GAACCTTGGAATTCCAGCCTCGANAACTGTGAGAAAAGAATTTTCTGTTGTTTCAG 
TCCCCCACACTATGGCATTTTGTTACGGCAGCCTGAGCTAATACTCCTACTTTGTC 
CTGCATTTACTTGGTCTTCCAGTTAGTTTTTTAGACTTTGGGAATCAGAGCAGTCA 
5 GTTGTCAGATTTTAGCTTACAGTTGTCCTACCTGTGCAACTGAAATTTCTTCCATT 
TTAAACCAGAGCAGAGTTTTAGAGTCAAAAGAAACCAGATCTTTTAGTGCAGAAGC 
TTTCCACTGTATTANAAGTGAGGAAGTTGGT 

Sequence ID 436 

1 0 AAAAAAACTCCAGAGAAGTTTATAGAAAGAGATGACATGTAAACCCTGCTGAAAAA 
TAGTTTCATTTGTTAGAATATAATTGTCTTCCACTAAAAAAAGAAAAAAAAAAG.CA 
TTTAAGGCTCTAAGATCTCTTGAAGTACCACTTTTCCTGAATCCCAGAGTTTTTAT 
GTGCATTATTTTTATGCGTTTGTAGTTTGATATGTTGTATTTATAAGTAGTTTTAG 
CTTTCCATTATGAATTCTTCTTTGACCCATGAGTTATTTAGGTAAGTGTTTAAAAA 

15 TTTACAATAGTTTATATATGCAAATATTATGTTGTTAGAGTTGGTTTTCATGTCAT 
TTTTACATATACAGGGGCAGTTTCCCCAACTAAATTGTATATTCCTTAAAGCAGCA 
CTCTTAAATTTTATTTCTGTGTCAATTTCTTGNCTGTGTTTCCTGGCATGGAATAC 
ATGGCATAAAATTTGTTATGTAATTAAATGAAATATTATTATACTTTCTATTTTTT 
AGAAAAAA 

20 

Sequence ID - 438 nt : 577 

GTCGACAGGGATGACATAACTATTAGTGGCAGGTTAGTTGTTGGTCACTTTCAACT 

CTGGGTTCAAGCGATTCTCCTACCTCAGCCTCCCGAGTAGCTGGGATTACAGGCAT 

2 5 GCACCGCCACACCTAATTTTCTATTCTTAGTAGAGACGGGGTTTCTCCCTGTTGGT 

CAGGCTGGTCTCGAACTCCCGACCTCAGGTGATCTGCCTGCCTCAGTCTCCCAAAG 
TCCTGGAACCACAGACATGAGCCACCACGCCTGGCCCCTTTTAAAATATTTCTGCT 
CATTGATGATGCACCCAGTCACCCAAGTGCTCTGATGGAGATGTATAAGGAGATGA 
ATGCTGTTTTCATGGCTGCTAATACAACATTCATTCTGCAACCCCCAAATCAAGAA 

3 0 GTAATTTTGACTTTCAAGTCTTATTATTTAAGAAATATATTTTGCAAGACTATAGC 

TGCCATAGACCGTGATTCCTCTGATGGATCAGACAAACTAAAATGAAAACCTCCTG 
CAACGTATTCATCATTCTAGATCCCTGAGGAATCGCCACACTGACTTNCACAATGG 
GTGAACTGGGTTACAGT 

35 Sequence ID - 441 nt : 552 

AAACAAAATTATTCTCTGAGAGGGAAAGGACATTTGAGGGAAACATCAAATTTCCC 
CATAAATAAATGAATGGAGTTTGCAGGAAGGTGAGGGTGAGCAGAGATGTGTGTGG 
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ACATCTCTGACCATCCATCGCTGTATTCAAATGGATTGTTTTATTCCATTCTGGTC 
TCAGGCATGACCACGTCCAGTGAAGACATTTGAGGCAGCACATCTCAGGACCCAGG 
CAATAGACTGGCCCCAACTCAGGCTGGACTAAGGTGTGATTAATTCTTTGTTTTTT 
GTGTGGAACAGCTCACCTTGTCAGACAGCCTCAGGGCATCTCTGAGACACAGGGGC 
AGAAAATGACATTCATCTTTTGAGTCCTCATCCATGGAGTGCTGTGTTTGGGGGGC 
TGCATCTGCTGAAGCGAGAACCCCATTCTGCCACCCCACCAGGATGCCCATTCTCC 
AGGACTTCTCCAACTTACTATTAGACTAAACCAGAACAAGCAACAAACTGTATTTA 
TGCAAGCAAAATTGATGAGAAZyiTTATATTCAAATAAAGCAAZy^ATTA 

Sequence ID - 442 nt : 606 

TCGTGCCACTGCACTCCAGCCTGGACGACAGAGTGAGACTCCATCTCAAAATAAAT 

AAATAAATAAATAAATAAATAAATAAATAAAAAAATAAAAAATACTTCTGCTATGA 

AAAACCTAGTTGGTATTTTTGCTTATTTAATACTATAGAAATATGGTGATCTCATC 

TTTAATAGAGTGCTTTTAAGGTCCCCAGTGATAATCTCCTAAAATCATGAACTTTA 

AGAATTTATAATGTTAATATGAGGAAATGAAATCTGGATTATCTCACCACATATTA 

TATAATTCATTAGTGACAGAGCAA.GAACTCCAGGTCACCTGTCTATTCCATGTTTT 

TCCTATCTGCCTTTAAATGTTGAGATACTACCCTTATCTCATGTGAATGGAGAAAC 

TGCCTAAAATGCTAAAACTGACTCAGAGGCACCCAGACATAAGTGAAGTGTGATTA 

GAAAATCCTGGTCAGTTGAGTCTTAGCCAAATGTGTACCTACTGTGTCTGCCTCTA 

TCAAGTCAATGAAAACATGATCTGAGAACTGTAAGTCCATTTATGGAAAGGGTTGA 

TTTANAGATATTTTGAACTTNCAGTGATGAGCCCCTTCTCAAATAG 

Sequence ID 446 

CGGACTCCTGTGCTAATTGTCAGCTTACATATCATTGTATAGAGACTGTTTATTCT 
GTACCAAACTGATTTCAAAAGTACTACATNGAAAATAAACCGGTGACTGTTTTTCT 
TGA.TAAAGTTCTGCGTTTGGCATCTTCACTCTTTCCAAAATGTATCTGTACATCAN 
AAATGTCACTATTCCAAGTGTCTTTTTAGTGTGGCTTTAGTATGGCTTCCTTTTAA 
TATTGNACATACATTGNATCTTTGTTTTATGGNAATAAGTAATAAAAATGTAGACT 
TCATATTTTGTACAAAATGTCCTATGTACAGAATAAAAAAGTTCATAGAAACAGCC 

NANAA 

Sequence ID 447 

AGGCCGAGGCAGGCAGATCNCNTGAGGTCAAGAGTTTGAGACCAGCNTAGCTAACA 
TGGTGAAACCCCATCTCTACAAAAATATA- AAAATTAGCCTGG - GTGGTGATGGGC 
ACCTGTAACCCCAGCTACTCGGGAGGCTGAGGTAGGAGAATCACTTGAACCCGGGA 
GATGGAGGTTGCAGTGAGCCAAGATCGTGCCACTGCACTCCAGCCTGTGTGACAGA 
ACAAGACTCTGTCTCAAAAAAAAATAATAATAATAATAATAATAAAAAGGAATAAC 
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ATAGCTAGGAATAAATTTAATCAAAGAGGTGAAAGACTTATACACTTAAAACTACA 
AAAAAAAAATCACTGAAGGAATTATAGACCCAAATAAAAATAAATAAAAAGACATT 
CTGTGTTTTAGGGAAAGAAGACTTAATATTGTTAAGATGTCAATACTACCCAAAGT 
GATCTACAGATTCAAXIATAAT 

TGGAAAAGCCAATTTTCAAATTCAGATGGAATTGCGAGGGGTTNTGAATAACAAAA 

CACNATCTTGGGGAAAAAAAACAAAAAACAAAGTCAAAGAACTCACACT 

TTATAAATTTACTACAAAGTTATAGTAATCNAA 

Sequence ID - 448 nt: 329 

TACGCACACGAGAACATGCCTCTCGCAAAGGATCTCCTTCATCCCTCTCCAGAAGA 

GGAGAAGAGGAAACACAAGAAGAAACGCCTGGTGCAGAGCCCCAATTCCTACTTCA 

TGGATGTGAAATGCCCAGGATGCTATAAAATCACCACGGTCTTTAGCCATGCAGAA 

ACGGTAGTTTTGTGTGTTGGCTGCTCG^CTGTCCTCTGCCAGCCTACAGGAGGAAA 

AGCAAGGCTTACAGAAGGATGTTCCTTCAGGAGGAAGCAGCACTAAAAGCACTCTG 

AGTCAAGATGAGTGGGAAACCATCTCAATAAACACATTTTGGGTTAAAA 

Sequence ID 450 

GAGCAGTGGCATGATCACACCTTACTGCGGCCTCCAACCCCTGAGCTTAAGTGATT 
CTCCCGCATTATCCTCCTGAGTAGCTGAGACTACAGGTGCATGCCACCATACACTA 
CTAAATTTGGGTCGGGTGGTGGTGGTGATTTTTTAATATTTTTGTAGAGACAGGGT 
CTCACTGTGATGCCCAGGCTGGTCTTGAACTCCTGGGCTCAAGCAGTCACCCACCT 
CAGCCTCCCAAAGCACTGGGATTACAGGTG 

TTTGTTTTGATGACTAAGCTGCTCTTGCTAAAAGGGCTTCTCTCTGAACTTCCCTA 
CCTTTCTTCTGTTTCCCTGGGCTAGGGCTCCATGTTGGCAGTCCTACTCCCAATTA 
ACCTGGGGCTGTCTGGTTAACCTTTATAAGATCTGCAGTCATTGGGAGACCCGGGG 
ACCAGGAATATTGTTGTTGAGGGAGCTACCCTGGAAAGTGGATGGGTGGCCAAAGG 

Sequence ID 452 

TTTGGCTTTGCCTCTAGGCATTAGATGTTATCTTTGGAGGCATCCTTCTATGAGCA 
TTCATTTTTGGACCAAGCCTGGATTTACAATTCTATTACTGGCCCAGACTTCATTT 
CTATCCAATTTCATTCCACTGTGCTATAGTTTACAACATATAATTTGACTTATAAA 
TAATTCCTGACTATGGGTTTAAAGACTGAAAATGGATCAATAGAAACTTTGAAAAT 
GTTAACATCTTGATTGCTTTTCTCAGTGTAGAAATGGACAATGTTTAGCTTAAAAA 
CTGCATGTTTTTAATGAGATACGGGGTTGAAAGACTTATTCCTGGAATTTATTGTT 
CTGGAGAAAGCCTGTTGCTATCTGCCATACCTTGGTTTACTTTGTGCAAAATGAGC 
TTCTTTTTAAGTAATGAGCTCTTTCCATGTTCAGCTTAAATTGCTGTCTTAGACAC 
• TTCATCAGGGTTCCCTGCTCTGCCTCATTCCCCCTTTTGCTCACTTGCAGCCTTTG 
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ACATAATCCTGGGAGGCAATTGGCATCATACATATTTTGCTTTGTAATCTCCTGCT 
TTGATTCTGACTGGGACCCAGC 

'Sequence ID - 453 n t : 747 

GGATCTAAGACCAGCCTGGCAGCCACCAGATGGTGATTCTAGTCCTGGCTCAGTCA 
GTAATAGGTCACTGACCCCAGAGAAATCAATTCAGCCTCCCCAGGTCCTTGGATTT 
CTTTCTGTGAAAATGAAAGCATAGGTAGGAATTTCCCATGGAACAGCTAGCAGAGG 
AGAAATATTAZ^AAGTCAGGAGACTCATGCTATAGTTTTCATACTTCATTACAACAA 
TGTTGTTTAGGACAAGTGAGTTAACCTGTTAGCTTCCTCTATATAl^AATGGAAAGT 
CATTAAAAACCTACATAGCAGGGTTCTTGTGAAGATGAA.GTGATAATGTAGGAAGC 
ATGTACAAATGTC^CATTCTGCCGTCACGTAATGGTCCTCACAGCTTGAGGTAGCA 
TTTAGCATGTGTCATGATTTAGTACAAGGGTTGGCAAACTGTTGCTCTTGGATTAA 
GTCTGGCTCATTGCCTGTTTTTCAAAGAAAAAAATTGTATATGTGTGTATATATGT 
TATATATAGGTACACACACATATGTGCTATATATAGCATATATACACACATAATAT 
ATAAACATGTACATATATAGCATTATATATATACCGTGTATAATATCTCCAGTCCT 
CATGACCAGCCATGCTTGTTCATTTACATTTGCATACTCTATGATTGCTTTCATGC 
AAC^ATGGCAGAGTTGAGTGATTGTTTTGCACAGANACTGTATGGCCCACTAAACC 
TAAAATATTAATCTCTGCC 

Sequence ID 454 

CTCCTGCCGGGCTCGTGGCGGCTTCTGTCCGCTCCGCGGAGGGAAGCGCCTTCCCC 
ACAGGACATCAATGCAAGCTTGAATAAGAAAAACAAATTCTTCCTCCTAAGCCATG 
GCATATCAGTTATACAGAAATACTACTTTGGGAAACAGTCTTCAGGAGAGCCTAGA 
TGAGCTCATACAGTCTCAACAGATCACCCCCCAACTTGCCCTTCAAGTTCTACTTC 
AGTTTGATAAGGCTATAAATGCAGCACTGGCTCAGAGGGTCAGGAACAGAGTCAAT 
TTCAGGGGCTCTCTAAATACGTACAGATTCTGCGATAATGTGTGGACTTTTGTACT 
GAATGATGTTGAATTCAGAGAGGTGACAGAACTTATTAAAGTGGATAAAGTGAAAA 
TTGTAGCCTGTGATGGTAAAAATACTGGCTCCAATACTACAGAATGAATAGAAAAA 
ATATGACTTTTTTACACCATCTTCTGTTATTCATTGCTTTTGAAGAGAAGCATAGA 

0 AGAGACTTTTTATTTATT 

Sequence ID - 458 nt : 682 

TGCCACTGAAGATCCTGGTGTCGCCATGGGCCGCCGCCCCGCCCGTTGTTACCGGT 

ATTGTAAGAACAAGCCGTACCCAAAGTCTCGCTTCTGCCGAGGTGTCCCTGATGCC 

AAGATTCGCATTTTTGACCTGGGGCGGAAAAAGGCAAAAGTGGATGAGTTTCCGCT 

TTGTGGCCACATGGTGTCAGATGAATATGAGCAGCTGTCCTCTGAAGCCCTGGAGG 

CTGCCCGAATTTGTGCCAATAAGTACATGGTAAAAAGTTGTGGCAAAGATGGCTTC 
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CATATCCGGGTGCGGCTCCACCCCTTCCACGTCATCCGCATCAACAAGATGTTGTC 
CTGTGCTGGGGCTGACAGGCTCCAAACAGGCATGCGAGGTGCCTTTGGAAAGCCCC 
AGGGCACTGTGGCCAGGGTTCACATTGGCCAAGTTATCATGTCCATCCGCACCAAG 
CTGCAGAACAAGdAGCATGTGATTGAGGCCCTGCGCAGGGCCAAGTTCAAGTTTCT 
GGCCGCAGAAGATCCACATCTCAAAGAAGTGGGGCTTCACCAAGTTCAATGCTGAT 
GAATTTGAAGACATGGTGGCTGAAAAGCGGCTCATCCCANATGGCTGTGGGGTCAA 
GTACATCCCCAATCGTGGCCCTCTGGACAAGTGGCGGCCCTGCACTCATGAAGGCT 

TTCAATGTGC 
Sequence ID 459 

TCCCGGAATCGCGGCCGCGTCGACCTTGTCCTTGAGCGTCAACCTTCTTTCCCTGA 
AGTGGCTGGGGTTCCTGTTTCCTTCTTTGATTGACAACTTGTGTTAACCCTCGCAC 
ATCTCTGGGCCAATTTTTGCTTGTAAGTCTTTCCGGAGACCCCTGGAATTTAAATC 
ATTAGCACCGCGCCCTTCCCCGAAGAGTCTTCGAAGGGTTGCCGCTTTTCGGTGGC 
GCAGTTCTCGCGAGAAGGTGACTTTCTTTCTCGGTATTTCCTGGTTTCCAGAATCC 
TTAGCGCGAGGCGGAAAAAATATTTCTCCCAGGTTGTGTTGATGCCGCGATTTTGA 
CTGAGACTTCTTCCCACGATTTCTGTTTTTGCTTCTCCAAGGAAAATGGCAGCTCC 
CGAGCAGCCGCTTGCGATATCAAGGGGATGCACGAGCTCCTCCTCGCTTTCCCCGC 
CTCGGGGCGACCGAACCCTTCTGGTCAGGCACCTGCCGGCTGAGCTTACTGCTGAG 
GAGAAAGAGGACTTGCTGAAGTACTTCGGGGCTCAGTCTGTGCGGGTCCTGTCAGA 
TAAGGGGCGACTGAAACATACAGCTTTTGCCACATTCCCTAATGAAAAAGCAGCTN 
TAAAGGCATTGACAAACTNCATCAACTGAAACTTTTAGTCATACTTTAATCG 



Sequence ID - 460 nt: bib 

5 CAGAGATCAAAATAGGCCTTACACAGTGCGACGCGAATTTAAAAGATTACCCCATT 
CAGGTGTATGGATTTTGCAGTATTAAAGATGCTGCCTGGAATAGGTCATTATCTTC 
TCCAAGTACTCTGTTAAGTCAATGAGTCACATAGAGTATAAGGTTTATTATCTGCT 
TTTCTTTCATTAAATAAATCTTTATTGAATTTCTACTACATTAAAAAACCAAACCA 
AAACAAAACAAACAAAAAAAACACTTCCCTGAGCCATAAAGGAGAAGGTAGTTTTG 
0 ACTGGAACCTTGAAGGATGGGTAAACTTTCAGCAGATAAAGATTGAGAGAAGACCT 
TCCAGGTAGAGAAAGCAGTGTGGGCACAGGCAAAGATGGAAGAACACACGTGGCTG 
TGGGAAACACAGCTAGAAGCCAGTGCGGATAGAGAGTAGGCTATGATGTGCAAAGG 
TTANACACTGGGAGAGACAGGTCCATGAGAGTAGCTTGGACTAACACAGGGAGGGT 
TTGGAATCCCAACTGGGGAACCTANAAATCAA 

5 

Sequence ID 461 

TAGGAGGCTTATTCACTGATTTCCCCTATTCTCAGGCTACACCCTAGACCAAACCT 



WO 2004/046382 



PCT/GB2003/005 102 



- 159 - 

ACGCCAAAATCCATTTCACTATCATATTCATCGGCGTAAATCTAACTTTCTTCCCA 
CAACACTTTCTCGGCCTATCCGGAATGCCCCGACGTTACTCGGACTACCCCGATGC 
ATACACCACATGAAACATCCTATCATCTGTAGGCTCATTCATTTCTCTAACAGCAG 
TAATATTAATAATTTTCATGATTTGAGAAGCCTTCGCTTCGAAGCGAAAAGTCCTA 
ATAGTAGAAGAACCCTCCATAAACCTGGAGTGACTATATGGATGCCCCCCACCCTA 
CCACACATTCGAAGAACCCGTATACATAAAAT 

Sequence ID 462 

TCTTTATCAAGTTGAGAAAGTTCCTCCCCTCTATTCCTAGTTTGCTAAGAGTCCTT 
CTATCCTATTTCTTAATGGTTTAGTAGATGACTCTGTGGTACTTTGAAGGTTGTTT 
GCAGAATTTCCATGCCATAGGCAATTTACCTTTCCTTGACATTTGAAGGATTGATG 
TTGGTGCCAAGTATAGAATCTTCACAGAGTCCTCCTGTAGCTTCTAAAGGTTTAGC 
TTGAAAATGTTAATTGCTTAACGCTAGTAAGTGAGTGAAAAAGCTGGGGATAAATT 
TTGTATCTTGCTTATATTTCAGTTCCCACCTCTGTCCNGACNAAACCCCCATATAT 

AA 

Sequence ID 463 . 
TAGTTTACATATCCCAACCTTTAAAAATATTCCTCTTATTAGCTTTATATTCACTT 

TATAGAAGTTGAGTTTTAATTAAAATTCTTGGCATCCTGAAGTATGTCACATAGCA 

TGTGCTCCTTATAAATATGTTGATATCTCAGAAGACAGCATCCCGGTTTTCATTTT 

ATAAAGTACCATACTTAAGAATGCTGTAATACTTATCTTTTATAACATGTTTCCTT 

CGCTTTGCTTGNCTTTTATGNCATCAGTTTTAACTGTTTACTTCATTTAACAGNTT 

ACATCATNCAACAGTTTACTTCATTAAACAGTAGGTGGAAAAATAGATGCCAGTCT 

ATGAAAATCTTCCCATCTATATCAAAATACTTTCAAGGATATACTTT 

Sequence ID - 464 nt: 615 

CGACTTTCAACCATCAAGTGAGGAATACCTTCACATAACTGAGCCTCCCTCTTTAT 

CTCCTGACACAAAATTAGAACCTTCAGAAGATGATGGTAAACCTGAGTTATTAGAA 

GAAATGGAAGCTTCTCCCACAGAACTTATTGCTGTGGAAGGAACTGAGATTCTCCA 

AGATTTCCAAAACAAAACCTATGGTCAAGTTTCTGGAGAAGCAATCAAGATGTTTC 

CCACCATTAAAACACCTGAGGCTGGAACTGTTATTACAACTGCCGATGAAATTGAA 

TTAGAAGGTGCTACACAGTGGCCACACTCTACTTCTGCTTCTGCCACCTATGGGGT 

CGAGGCAGGTGTGGTGCCTTGGCTAAGTCCACAGACTTCTGAGAGGCCCACGCTTT 

CTTCTTCTCCAGAAATAAACCCTGAAACTCAAGCAGCTTTAATCAGAGGGCAGGAT 

TCCACGATAGCAGCATCAGAACAGCAAGTGGCAGCGAGAATTCTTGATTCCAATGA 

TCAGGCAACAGTAAACCCTGTGGAATTTAATACTGAGGGTGCAACACCCCATTTTC 

CCTTCTGGAGACTTCTAATGAAACANATTTCCTGATTGGCATTAATGAANAGTCA 
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Sequence ID 469 

GATTTTTAAAAATACATATAGCAAAAATATTACAGGGTCAGGGGAGACAATTAGAA 
TGATATAATTCAAAGTGGATTAAAAAAAAAACTGTCACCCAGAATACAATACCCAG 
CAAAGTTGTCCTTCATAA^TGAAAGAA^ATNAAATCTTTNCCNAACNA 

5 

Sequence ID 471 
TCCCGGGAATCTGCAGGATCCGTCGACT 

Sequence ID 472 

1 0 GACAGTGCCCAGGGCTCTGATATGTCTNTCACANCTTGNAAAGTGTGAGACAGCTG 
CCTTGTGTGGGACTGAAAGGCAAGATTTGTTCCTGCCCTTCCCTTTGTGACTTGAA 
GAACCCTGACTTTGTTTCTGCAAAGGCACCTGCATGTGTCTGTGTTCTTGTAGGCA 
TAATGTGAGGAGGTGGGGANACCACCCCACCCCCATGTCCACCATGACCCTCTTNC 
CACNCTNACCTGTGCTCCCTCCCCAATCATNTTT 

15 

Sequence ID - 473 nt : 694 

TGGGCTTTGGGCTGGCTGCAGTCTGTCTGAGGGCGGCCGAAGTGGCTGGCTCATTT 
AAGATGAGGCTTCTGCTGCTTCTCCTAGNGGCGGCGTCTGCGATGGTCCGGAGCGA 
GGCCTCGGCCAATCTGGGCGGCGTGCCCAGCAAGAGATTAAAGATGCAGTACGCCA 
2 0 CGGGGCCGCTGCTCAAGTTCCAGATTTGTGTTTCCTGAGGTTATAGGCGGGTGTTT 
GAGGAGTACATGCGGGTTATTAGCCAGCGGTACCCAGACATCCGCATTGAAGGAGA 
GAATTACCTCCCTCAACCAATATATAGACACATAGCATCTTTCCTGTCAGTCTTCA 
AACTAGTATTAATAGGCTTAATAATTGTTGGCAAGGATCCTTTTGCTTTCTTTGGC 
ATGC^GCTCCTAGC^TCTGGCAGTGGGGCC^GAAAATAAGGTTTATGCATGTA^ 

2 5 GATGGTTTTCTTCTTGAGCAACATGATTGAGAACCAGTGTATGTCAACAGGTGCAT 

TTGAGATAACTTTAAATGATGTACCTGTGTGGTCTAAGCTGGAATCTGGTCACCTT 
CCATCCATGCAACAACTTGTTCAAATTCTTGACAAT 

TATGGGATTCAATCCCCACCATCGATCATAGCACCCGCTATCAGCACTGNAAACTC 
TTTTGCATTAAGGGATCATTGC 

30 

Sequence ID 474 

GGCAGCGCGGGGAGCCCGTCGGCGCCGGCGGGCGGGCCGGTTTCGAAGTTGATGCA 
ATCGGTTTAAACATGGCTGAACGCGTGTGTACACGGGACTGACGCAACCCACGTGT 
AACTGTCAGCCGGGCCCTGAGTAATCGCTTAAAGATGTTCCTACGGGCTTGTTGCT 

3 5 GTTGATGTTTTGTTTTGTTTTGTTTTTTGGTCTTTTTTTGTATTATAAAAAATAAT 

CTATTTCTATGAGAAAAGAGGCGTCTGTATATTTTGGGAATCTTTTCCGTTTCAAG 
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Sequence ID 475 

CATAATAAAAAACAATCAACAAACAGGGAATGGAAAGAAACTTCCTCAGCATGGTG 
AAGGCCAGATATGAAAATCCCACAGCTAACATCATACTCAATGATGAAAGACTGAA 
AGCTTTTCTCCTGAGATCAGGAACAAGACAAAGATGTCACCTTTTGTCACTTCTAT 
TCAACTCATTATTGGAAGTTTTTGCCAGAGCAATTAGGTAAG 

Sequence ID - 476 nt : 476 

CAGAATCTTTTCATAGGCTGAATGTTGCTCCACAATGTGTCCTTTGACTATCTCTG 

GCTAATTATTATTTTAATCTCTTCTCAGCTTTTCCAAGAACATAACGTTAACCAAA 

GATCTTAGGCCATTCACAACTCTTTTGTAAAAATTAATGTGGATGTGAAACGAGGC 

AACAAATCCTGAAGTAGAAAGTTATTCCTGGCCAGGCACGGTGGCTCACGCCTGTA 

ATCCTGGCACTTTGGGAGGCCGAGGTGGGTGGATCATGAGGACAGGAGATCGAGAC 

CATCCTGGCCAACATGATGAAACCCCATCTCTACTAAAATACAAAAAATTAGCTGG 

GCATGGTGACGCGTGCCTGTAGTCCCAGTTACTCGGGAGGCTGAGGCAGGGGAATT 

GCTTGAACCTCGGAGGTGGGAGGTTGCAGTGTGCCGAGATCACGCTACTGCACTCC 

AGCCTGGCAACAGAGCAAGACTCCATCT 

Sequence ID 477 

AAACAGAAAGTTTCTTCTAAAGGCATGATTCAGTTAAGTCATTCTTAAGTGTTAAA 
AAATTGTGAAAAATGTGCCTGTAATCCCAACACTTTGGGAGGCCGAGGCAGGCAGA 
TCACGAGGTCAGGAGATCAAGACCATCCTGGCTAACAAGGTGAAACCCCGTCTCTA 
CGAAAAATACCAAAAACATTAGCCGGGCGTGGTTGTGGGCGCCTGTAGTCCCAGCT 
ACTTGAGAGGCTGAGGCAGGAGAATG 

Sequence ID 478 

TTCTTGGGATATTGATGACTACTGTCTGAGAGGTGCTGTGGGGAGATTTTCAGGAT 
TGTGTGGTCTTTGAGGGGGGTGTTTTTTTAAGACAACATTGACCACTGTCCACTGT 
CCACATGATCATTGTAAAATTGCAATGCCGCATGCTAGTTGGTTACATAAGACATA 
ATTCCAGTGATTGAAGGTGGTTACACTGTATGGTGGTGTGTTCAAGATGGCACTGG 
CATCTTTGAGCAGAGCCTGGCTATGCAGCATCATTTGAGTTTTTTAAACACCCTAN 
AGGTCTGGTTGTTGTTGCTGTTGTCCTTTCCTGTGAAAGTCACAANANAAGTTACA 
GTCCAGGTGAACCTGGAGTTTATAGGTTGGTTTTGTTTCTGNTATATATATATATA 



TCACCTGCATGCTATTTCTAGTGAGTGCTAAATACAGTATGGTCCAATGACAATAA 
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CAGCCCATGGTACTGCCAG 
Sequence ID 479 

CATCAGTCTGTTATCCATGCTGACTTTCCGAAGACTTGCAGCTACTGCATTGATAT 
5 CTTTCCTGCCAATAAGCAAAGTGTTGAACACTTCACAAAATATTTTACTGAGGCAG 
GCTTGAAAGAGCTTTCAGAATATGTTCGGAATCAGCAAACCATCGGAGCTCGTAAG 
GAGCTCCAGAJ^AGAACTTCAAGAACAGATGTCCCGTGGTGATCCATTTAAGGATAT 
AATTTTATATGTCAAGGAGGAGATGAAAAAAAACAACATCCCAGAGCCAGTTGTCA 
TCGGAATAGTCTGGTCAAGTGTAATGAGGACTGTGGAATGGAACAAAAAAGAGGAG 
10 CTTGTAGCAGAGCAAGCCATC^AGCACTTGAAGCAATACAGCCCTCTACTTGCTGC 
CTTTACTACTCAAGGTCAGTCTGAGCTGACTCTGTTACTGAAGATTAGGGAGTATT 
GCTATGACAACATTCATTTCATGAAAGCCTTCCANAAAA 

Sequence ID 481 

1 5 CACACTTTCATGATAAAAACAGAACCTAGGAATGAAAAGAAATTATAGCAACATAA 
TAAAGACCATATATGAGAAGCCCACAGCTAACATACTGTATGGTGAAAAACTGAAA 
GCTCTTCCTCTAAGATCAGGAACAAGGCAAGGATGCCCATTCTTGCCACTTCTATC 
GAACGTAGTACTGGAAGCCCTAGCCAGAACAACTAGGCAATAGAAAGAAATTAAAG 
GCATCCATNTCAGAAAGGAAGAANCAAAATGCTGTCTGTTTAANATGACA 

20 

Sequence ID 482 

TTTCTATANAAAAAAATTTTTTAAAATAATTGTAAAGTTAGATTTAAAATTGTAAA 
ATATAAAATCACAAAGGAATGTACCCAATAAAATGTAAATGCNCCATAAAAAAAAA 
AAAAAAAAAAAAAAAAAA 

25 

Sequence ID 483 

CGNTAACGTGCAATCCGCCGCACGCCAGCAAACTGGACAAACTCCGGGATCTCATC 
GAAGCGATTGAGCACCAGTACCAGAGTAATACCGGACTGATGTAACGAGGCGAGTC 
GCTCATCCAGCTTGCTGACGTGAGGCAACATCCAGGCCATCGAACGGNTCATCAAG 

3 0 AATCAACAAGTCAGGCTCCGACATCAGCGCCTGACACAGCAGGGTTTTTCGCGTCT 
CGCCAGTGGAAAGGTATTTAAAGCGTCNGTCGAGGAGGGCGGTAATACCGAACTGC 
TGCGCCAGTTGCATGCAACGCGGTGCATCCTTTACTTCATCCTGAATGATCTCAGC 
CGTAGTGCGTCCGGTGCCATCTTCGCCAGGGCCGAGCATATCGGTGTTATTCCGCT 
GCCATTCGTCGCTGACGAGTTTTTGCAATTGCTCGAAGGAGAGACGAGTGATGTGG 

3 5 GAAAACTGGCTTTGCCGTTCACCTTTCAAAAGCGGGAAGTTCCCCCGCCAGCGCGC 

GGGCCAGGGCCCGAT 
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Sequence ID 484 

TTTTTTTTTTTTATTCTATTAAAAAATGTTNNTGAAAAAAGATACTTAAATTTTAA 
AGATAACTNAATTCCTAANGATTTAAAAT 
GCAAATGCNTAAAAAGACCCCANA 
5 AAGCATATATATNTCATAAAAGCAATAANAAGGCNTAAAGCAAGTTTGGGGAGAGC 
TTATTTAAAACTTGTAAAAATCATTTGAATTTTTAAAAGTTTTCAAAC 

Sequence ID - 485 nt: 551 

TTTGGAACACAAAGTTCCCTTTTTAGAAGAATAGGTATTGAGCCCTTGAGCGTGGG 

1 0 TAGAAAGATAGAGACAGAGTGATTTGCAAAATAATGGAGGATCATATTTATATATG 
AATTTTCACTTATTTGAACTTTCAGATATCANCTTNAAAANCTTTGGTTTAAGTAA 
AGTNTNTTAATGAGACTCCTTGGATGAAAGTAACCAAAACCAGTAAAAATAAGGTA 
ATAAGGATGTAATAGTTTCTTATGGACACT # 
GAAAAGAATTAGAACAAATAACTGGAAGGCCATCAGGAGTCCAAAACCATCACTCT 

1 5 TTTATATTTTATATTTTATTTTTCTCTCTTCANATGAGCATTCTCTTTCTATGTCC 
ATATGGTANAAGGCGGCAGCTCCATAGATTATGGCTTCAGATGTTACAGTTCCGCT 
NAATGCAGGGACAGACTTGCTATCTTTCAGTCCCCTTACATATCCTGGGGAGAGAG 
CAAATGATTGACTGGCTTGAGTCAGGTGCCCGTTCCCTTTCCAATCT 

20 Sequence ID - 487 nt:224 

GTTTGNTTGTGACCATCTGTACTTGTAATTTCTTTACNTTCATTGGTATGAAAAAT 
ATGTTCTTAGAAGCANGAAAAAGAATTCAGNTTTGCTTTGTATACTAAATTAAATG 
CTGTAATTTTGATAAAATGAAAAATCTGCTTTATTTGCAACAATTGGTTTCTTCCT 
TGACGTCAGCCTCACTCTTGGACTTTGGTATTCAGCCNGNCACCCCTGGGAATTCC 

25 

Sequence ID - 488 nt : 349 

GTGCCTCCCTGTGTGAGTAGCCTAAGGTGCATTGAAAAAGACTGGGATGTGTTTTA 
TTTTTTTGTATTAGATAGCATTAACCTTACTGTTGAAGTATTTTTGGTGGAGTATT 
AGTGACAAGCCATTGAGTCTTAAGCCTTACGGCTTCCTATAAAATCACTAATTTCG 
30 TGTGTGTTTGTGTGTAGGTTACGTTATATATAGGATTCGTGTTCGCCGTGGTGGCC 
GAAAACGCCCAGTTCCTAAGGGTGCAACTTACGGGAAGCCTGTCCATCATGGTGTT 
AACCAGCTAAAGTTTGCTCGAAGCCTTCAGTCCGTTGCAGAGGANCGAGCTGGACN 

CCCTGGGGGGCTC 

35 Sequence ID 489 

TTAACAGCTGCATAGAGTTTTAAAAGTACATTATATTTTGTCAGACAAGTAAAATA 

TCTGTTTTTC^CGCAAAAAAAGCCATGAAATACGTAATTTTTTAAAGACAAAAAAT 
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CATCTTTTGAGTTTGCTCTTTGGTTTTTCTTCATTCCTTTTGAGGATTGGGAAAAC 
AGAAAGATTCTTTGATTTGGGTAATGAAGAGGTAATTTGGGACAGTGTGGTGGTAC 
CAGGAAGAAAGAGGATTGGAAAGGCCAGTACTGTTTTAGTTGCTCGGCACTGTTGG 
TTTTGTTTTAATGTGGTTGCCCTGTCCACTACATGGTTCTATCAGTAGTGTAATCC 
ATTTTCAATGTAAAGCTCTTTTAGTTTTTGTCATAGACATAAATTAATATTTTGAG 
AGGCATCCCTCACCTGTTCATTTCTTCTGTGTTGAAATGAAGTACTTAAAATTACC 
GTTATACATGAACTTTGTGGACTGTAAGATTTGTTATATATGTTCAAATGCCTTTT 
AGCTGGCTTTTTAATTAATATGCCTGTTTTGAGTGCTTAATACAATGTAATGNGGA 
TTGTAAATCATACCTATTTTAAATCATTCCTTCCTGTATATTTGNACTCAGAGAGC 
CTTATTTTATTCTTCCAGC 

Sequence ID - 491 nt : 382 

iTTTTCTTAGAACTTTATTTTTTCTGGCCAGGCGCAGTGGCTCACACCTGTAATCCC 
AGCACTTTGGGAGGCCAAGGCAGGTCGA'TC^CCTGAGGTCAGGAGCTCAAGACCAG 
CCTGGCCAACATGGTGAAACCCTGTCTCTACTAAAAATACAAAAATTAGCTGGGCG 
TGGTGGCGCATGCCTGTAATCCCANCTACTCAGGAGGCTGAGGCAGGAGAATTGTT 
TGAACCCGGGAGGCGGAGGTTGCANTGAGCCGAGATTGCGCCACTGCACTCCAGCC 
TGGGC^CAGAGCGAAACTCCATCT(^AAAAAAAAAAAAAAAAACAACCTTTATTT 
TTTCTGATTTTAAAAGTAATAACTAGTTTGTAGAAACATTAAAAGT 



Sequence ID 492 

ACCCTAAACATAACTTAAAATTTGTTNGGAATTTGAAAGTACAGAATTTTCCTGTA 
ATTGAGACTNTTTAAACTTTTGTGGTTGGAGAAGGTATTCTATTTTTTGAAAATAT 
CTGTAAGTTTTATCTAAATAGTAAACTCTAAGTATTCTTCCCCTTTACTTACAGCC 
ACCCTGGGAATCTGAGACTAGAGAAAATAAAGTTTGTCTCTTGTTCTAAGGAGGGT 
CTGGTTTAGAAATCTGATTTAGACATAGAAAAATTGCAAGAAGCTTGAGGTGATTG 
GAAGATACGATTTTGTTATCAAAGNATGTTTCTGTTTTATAGATTTTATTCATCTA 
CAACTCCTTATTAATATATTTAAGAAGTCATTAACCCACCATTGATTACTTGATAT 
AAAAGGAGAANCGGTGGTAAAAGGTGAAATANAATTTTTAATTTTTTTTTTTTTAA 
GTTTAGGATTTTTTTTTAAATTCTAAGAGTTTCTGTCATTTGGGGACAATCAGAA 

Sequence ID 493 

TGGGAATCATAATTNGTTAACTGAAGCTNATAAGATGAGAGCATTCANAGAGAAAA 
GAACGGAAAGATTGAATATCAGTTTCCCTTCTTTAAAAAAATTGTGGATATGTGAT 
CTAGCTTCTTGAGCATCACAGTGACTGATTGGCTCGTGGTAATTGATCGCTATGCT 
GACAATCTTATCTCCACCTATGTCATTCAATTTTCTAAGAGGCAAAATCCTTAATC 
AGGAGGAGAGTTTAGCTCTAGCTAAATTTCCCTTGTCCAGCATGCTCCTGCTCCCC 
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CAACTTGTGGAAACAGCTAAAGGATTGGACTAGGAGCANAAGTTTGGAATGGTTAA 
AATGTAGCAAC^TGTGTTTCCTGAAACAAAATTCCACTATAATAAAAA^GCATTT 
GAATGCTCCCTTGTAATTCTGTTGGAGCTTGTTGCCTTTTTTATGACACAACCATA 
ATCAGTGATAGACAGTAGGATAAAGAAGCAAGAGCAAAGCAATTAAGTAATAATAG 
5 CACTACAAAAATGTGTGCTGTACTTACCAAACACGACATTTATGAATTATTANATA 
GGAATAAGGGGATGGT 

Sequence ID 494 

1 0 GACCCAGCCATCTAAATAAGTTRTACATGTTGCGTATTTTTTTGTTAGGGACTTAT 

•i 

CTTCCGAAGAGGAAAGGTTTATGAAACCTAAAGTAACAATGATAGCTTGGAATCAA 
AATGATAGCATTGTTGGCACAGCTGTGAATGATCATGTCCTCAAAGTGTGGAATTC 
TTACACTGGACAACTGCTTCATAACTTAATGGGACATGCTGATGAAGTATTTGTTC 
TGGAGACACATCCCTTTGATTCCAGAATTATGTTATCTGCAGGACATGATGGCAGC 
1 5 ATATTTATATGGGATATTACAAAAGGTACCAAGATGAAACATTATTTTAATATGGT 
AAGTGAAGTGAGATGTACCTTGATACATGCTTGATAATTTGTTTAGAGTATTTGGG 
TTATGCGGCTTACCCAGAAATTGATCTGCTTGTTTTGGCAGTTTGTTTTTACAAAT 
CAACATATTCZ!AAAGCCTGCTAAATATTAGACAGCTACATGTATATACGTA(^TACA 
TGAA 

20 

Sequence ID 495 
TTTC 

Sequence ID 496 

25 CTCGCTGGCGGGAGGCCACGGGCTTTCCACAGCGCGGGGGAACGGGAGGCTGCAGG 
ATGGTCAAGCTGACGGCGGAGCTGATCGAGCAGGCGGCGCAGTACACCAACGCGGT 
GCGCGACCGGGAGCTGGACCTCCGGGGGTGATCTGGACCCTCTGGCATCTCTCAAA 
TCGCTGACTTACCTAAGTATCCTAAGAAATCCGGTAACCAATAAGAAGCATTACAG 
ATTGTATGTGATTTATAAAGTTCCGCAAGTCATAGTACTGGATTTCCAGAAAGTGA 

3 0 AACTAAAATTTTAATCCAGGTGCTGGTTTGCCAACTGACAAAAAGAAAGGTGGGCC 
ATCTCCAGGGGATGTAAJ^GCAATCAAGAATGCCATAGCAAATGCTTNAACTCTGG 
CTGAAGTGGANAGGCTGAANGGGTTGCTGCAGTCTGGTC 

Sequence ID 4 97 

35 GAAGACCTCACATCTGAGAGCTCATCTGCGTTGGCATTCTGGAGAACGCCCTTTTG 
TTTGTAACTGGATGTACTGTGGTAAAAGATTTACTCGAAGTGATGAATTACAGAGG 
' CACAGAAGAACAGATACAGGTGAGAAGAAATTTGTTTGTCCAGAATGTTCAAAACG 
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CTTTATGANAAGTGACGA.CCTTGC 

GTATTCACTCTANGAGTACAGTGCTGGCATCTGTGGAAGCTGCGCGAGATGATACT 
TTGATTACTGCAGGAGGAACAACGCTTATCCTTGCAAATATTCAACAAGGTTCTGT 
TTCAGGGATAGGAACTGTTAATACTTCCGCCACCAGCAATCAAGATATCCTTACCA 
ACACTGAAATACCTTTACAGCTTGTCACAGTTTCTGGAAATGAGACAATGGGAGTA 
AATATTACACAAATACTTATTCATTGNGGTTATTTTTATACAGTAGTGAGAAGAAT 
ATTGTTCCTAAGTTCTTAGATATCTTTTTTTGGATGTGCAAAAATTTTTGGATTGA 
CAGTAACTTGGGTATACATGACACTGAAATGCCTTACTTTGGATGA 

Sequence ID 499 

TGCCTGCGGGCCAGGACCTCGCCCAGCCCATGTTCATCCAGTCAGCCAACCAGCCC 
TCCGANGGGCAGGCCCCCCAGGTGACCGGCGACTGAGGGCCTGAGCTGGCAAGGCC 
AAGGAGACCCAACACAATTTTTGCCATACAGCCCCAGGCAATGGGCACAGCCTTCC 
TCCCCANAGGACCCGGCCGACCTCAGCGCCTCCTGCAGGCTAGGACACTGGTGCAC 
TACACCCCATGCCTGGGGGCCGAGATTCTCCAGCAGAAAGATGCAATATTTTTTGT 
TTCCTTTTTTTCCATTTTTTTCTCTAAGGAATCAATATTTCAATATGTTGAGTGTG 
TGTCCAATGCTATGAAATTAAAATATTAAATAACATATTTATGGCATTTTCTTGAA 
GAGTGTGGTTGAAGAAATATTTCTCCTTTTGTTTTTCTTTTTTTTTTGNTTGNTAC 
TGCCACTTCTTTTTAGGAGCAAATCTCCCGAGGGGTGTACGGNATTTCTTGACTCT 
GGGAACAGCTGCTACCCCCAAGACTTGCCACGTTGTTCTGCCCTCAAATGGAATTA 

AGTG 

Sequence ID - 500 nt: 390 

GGAATATGGTCAGGATCTTCTCCATACTGTCTTCAAGAATGGCAAGGTC3ACAAAAA 

GCTATTCATTTGATGAAATAAGAAAAAATGCACAGCTGAATATTGAACTGGAAGCA 

GCACATCATTAGGCTTTATGACTGGGTGTGTGTTGTGTGTATGTAATACATAATGT 

TTATTGTACANATGTGTGGGGTTTGTGTTTTATGATACATTACAGCCAAATTATTT 

GTTGGTTNATGGACATACTGCCCTTTCATTTTTTTCTTTTCCAGTGTTTAGGTGAT 

CTCAAATTAAGAAATGCATTTAACCATGTAAAANATGANTGCTAAAGTCAGCTTTT 

TAGGGCCCTTTGCCAATAGGTANTCATTCAATCTGGTATTGATCTTTTCACAAA 

Sequence ID 502 

acccgccatcttccagtaattcgccaaaatgacgaacacaaagggaaagaggagag 
gcacccgatatatgttctctaggccttttanaaaacatggagttgttcctttggcc 
acatatatgcgaatctataagaaaggtgatattgtagacatcaagggaatgggtac 
tgttcaaaaaggaatgccccacaagtgttaccatggcaaaactggaagagtctaca 
atgttacccagcatgctgttggcatt'gttgtaaacaaacaagttaagggcaagatt 
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CTTGCCAAGAGAATTAATGTGCGTATTGAGCACATTAAGCACTCTAAGAGCCGAGA 
TAGCTTCCTGAAACGTGTGAAGGAAAATGATCAGAAAAAGAAAGAAGCCZIAAAGAGA 
AAGGTACCTGGGTTCAACTAAAGCGCCAGCCTGCTCCACCCAGAGAAGCACACTTT 
GTGAGAACCAATGGGAAGGAGCCTGAGCTGCTGGAACCTATTCCCTATGAATTCAT 
5 GGCATAATAGGTGTTAAAAAAAAAAAATAAAGGACCTCTGGG 

Sequence ID - 503 nt : 109 

ACATTTTCCGGNCCTTTTGCCATACACAGTTACAGAGATCAGTCAAATC 

CCACTGAGATCTCATTTATTGCGACAGATGCAGAAAATAAATA 

10 

Sequence ID - 504 nt : 374 

CCAGCAACGACCCATACCTCAGACCCGACGGCCCGGAGCGGAGCGCGCCCTGCCCT 
GGCGCAGCCAGAGCCGCCGGGTGCCCGCTGCAGTTTCTTGGGACATAGGAGCGCAA 
AGAAGCTACAGCCTGGACTTACCACCACTAAACTGCGAGAGAAGCTAAACGTGTTT 
1 5 ATTTTCCCTTAAATTATTTTTGTAATGGTAGCTTTTTCTACATCTTACTCCTGTTG 
ATGCAGCTAAGGTACATTTGTAAAAAGAAAAAAAACC^^ 
TTTGTATTGTANATAAGAGGAAAAGACTGAGCATGCTCXCTTTT 
TTACAGTATTTGTAAGAATAAAGCANCATTTGAAATCG 

20 Sequence ID 505 

GTACAGGAGGTAAATTGGATACCCCATCTAAGGGGATCTGTGAGACCAGGTAGTTA 
TTTGGAATGAAAGAGTAAGATATTAAACCAGCCAGCATGTCAACAGGTGGGTGATA 
GTCTTGTTCTCACAGACAACAGATGGCCATCATCTTAAAACAACATTTATGTTAAC 

2 5 CAGCAGATAAGGGACTCCTGCATTGTCAGTGGACTTTGAGCCTGAGTTTTTCTACT 

TGCATAGGTGAAAGTGGACTGCAATGCTAGTATAAATGCCGTATGATGACTAGTAC 
CCCTTAGGGAGCTCCAGTTTGCCTTCCTGGGGAACCACAGACCCCAAGTGTAATTT 
CCTGAGGACAGCCCGACTTCT 

30 Sequence ID 506 

GTTACTGTGAGCCTGTCAGTAGTGGGTACCAATCTTTTGTGACATATTGTCATGCT 
GAGGTGNGACACCTGCTGCACTCATCTGATGTAAAACCATCCCANAGCTGGCGAGA 
GGATGGAGCTGGGTGGAAACTGCTTTGCACTATCGTTTGCTTGGTGTTTGTTTTTA 
ACGCACAACTTGCTTGTACAGTAAACTGTCTTCTGTACTATTTAACTGTAAAATGG 

3 5 AATTTTGACTGATTTGTTACAATAATATAACTCTGAGATGTGTGAAAAAAAAAAAA 

AAAAAAAAAAAAA 
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Sequence ID - 507 nt : 521 

CTGCGGTGGAGCCGCCACCAAAATGCAGATTTTCGTGAAAACCCTTACGGGGAAGA 
CCATCACCCTCGAGGTTGAACCCTCGGATACGATAGAAAATGTAAAGGCCAAGATC 
CAGGATAAGGAAGGAATTCCTCCTGATCAGCAGAGACTGATCTTTGCTGGCAAGCA 
GCTGGAAGATGGACGTACTTTGTCTGACTACAATATTCAAAAGGAGTCTACTCTTC 
ATCTTGTGTTGAGACTTCGTGGTGGTGCTAAGAAAAGGAAGAAGAAGTCTTACACC 
ACTCCCAAGAAGAATAAGCACAAGAGAAAGAAGGTTAAGCTGGCTGTCCTGAAATA 
TTATAAGGTGGATGAGAATGGCAAAATTAGTCGCCTTCGTCGAGAGTGCCCTTCTG 
ATGAATGTGGTGCTGGGGTGTTTATGGCAAGTCACTTTGACAGACATTATTGTGGC 
AAATGTTGTCTGACTTACTGTTTCAACAAACCAGAAGACAAGTAACTGTATGAGTT 

AATAAAAGACATGAACT 
Sequence .ID 508 

AAGCTCATGATTTTAAATGTATTTTTCTAATAAACTATACTCCCATTTAAAAATCA 
CCAATACCTTAATGTTTCAATTATATAAGCTAATTAl^AAATAAAGGCTGGGCGTGG 
TGGCTCACTTTGGAAGACCGAGGCAGGCAGATCACCTGAGGTCAGGAGTTCGAGAC 
CAGCCTGCCCAACATGGAGAAACCCCATCTCTACTAAAAATACAAAATTAGCCAGG 
CATGGTGGCACATGCCCGTAATCCCAGCTACTGGGGAAGCTGAGGCAGGAGAATCA 
CTTGAACCTGGGAGGCAGGGGCTGCAGTGAGCCGAGATCATGCCATTGCACTCCAG 
TCTGGGCAAC^TAGTGGAACTCCATCTCAAAAATAATAAAAAAAATAAAATAAAA 
ATAAAATTCAAACCTAAAATAGATGCTCTACTTCAGGAGTGGGCAAATTAATCACC 
TGCATCCTTTTTTTGGGCTTTC 

Sequence ID - 509 nt: 575 

TTTTTTTCTAAATGGNGATTACTAATATATGTGGAGACTATTAATCTCTTTTCTGT 

TGCCATTAGTTCATTTTTCC 

ATAAAATATAAGTTAAAAGAAAAACATAAAACCCTACAATCTTACCCACC(^GA(^ 
ACTACTATTAATACCTTAGTATTAACATATACACATCATGTATATGTATAAATTTA 
TCTTAAACAAAAATAAAATTATTCTTTACATATTGTTTTAAAACCTATTTATCTGG 
CCAGGTGCCGTGGCTCACGCTTGTAATCCCAGCACTTTGGGAGGCTGAGGCACGTG 

GATCACCTGAGGTCAGGAATTCGAGAC CAGCC CAGCCAACATGGTGAAAC C CTGT C 
TCTAATGGTTTAAATACCAAAAAATTAGCTGGGCATGGTGGCACATGCCTGTAATA 
TCAGCTAACATGGGAGGCTGAGGCAGGAGAATCACTTGAACCANGGAGGGGGAGGT 
TGCAGTGAGCCGAAATCACACCACTTCACTGCAGCCTGGGCAACAAAGCAAGACTG 

5 TCTCAAAAAGAAAAA 



Sequence ID 510 
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CACTGTCATTCCCAGGAGGCTTTGGAGTCAGAACTGGATTCAAATTCTGACTNTAT 
GTTGTGTGACTTGGGCCAATAGCTTCTTT3STTGTGCCTCAGTTTCTTTAGCTGTAAA 
TANACGGGTAGGTCACCCCTTACCCCATAGGTTATGGGGAAAGTTACAGAAAATGG 
TCAGCTGGGCNCAGTGGCTCAAGCCTGTGGTCCCAGCNCCTTGGGAGGCCAAGGTG 
5 AGCAGATTGCTTGAGCCCAGGAGTTTGACACCAGTNTGGCAACGTGACGAAACCCT 
ATCNCTGTGAAAAATACAAAAAATTAGCCAGGCATGGTGGTGTGTGTCTGTGGTTC 
CAGCTGCTTGAGAGTTTGAAGTGGGAGGATCACCTGAGCCCAGAAGGTCGAGGCTG 
CAGTGAGCTGTGATCGCGTCACTGCACTCCAGCCTGGC - GACAGAGTGAGA- CCCC 
T - TTTGAAAAAAAAAAAAAAAAAAT 

. 10 

Sequence ID 512 

GTGAGCGGTGGTGGTTTATTCTTCCGTGGAGTTAAGGGCTCCGTGGACATCTCAGG 
TCTTCAGGGTCTTCCATCTGGAACTATATAAAGTTCAGAAAACATGTCTCGAAGAT 
ATGACTCCAGGACCACTATATTTTCTCCAGAAGGTCGCTTATACCAAGTTGAATAT 
1 5 GCCATGGAAGCTATTGGACATGC^GGCACCTGTTTGGGAATTTTAGCAAATGATGG 
TGTTTTGCTTGCAGCAGAGAGACNCAACATCCACAAGCTTCTTGATGAAGTCTTTT 
TTTCTGAAAAAATTTATAAACTCAATGAGGACATGGCTTGCAGTGTGGCAGGCATA 
ACTTCTGATGCTAATGTTCTGACTAATGAACTAAGGCTCATTGCTCAAAGGTATTT 
ATTACAGTATCAGGAGCCAATACCTTGTGAGCAGTTGGTTACAGCGCTGTGTGATA 

2 0 TCAAACAAGCTTATACACAATTTGGAGGAAAACGTCCCTTTGGTGTTTCATTGCTG 

TACATTGGCTGGGATAAGCACTATGGCTTTCAGCTCTATCAGAGTGACCCTAGTGG 
AAATTCGGGGGATGGGAAGGCCACATGCATTGGAAATAATANCGCTGCAGCTGTGT 
CAATGTTGAAACAAG 

25 Sequence ID 513 

TTTTTTTTTTATAAACTCCAATCATTTCCAGAGCTACTTAGCTCAGCATCTTTTTT 
TTCCACGCTCTTAAGTTGTGTTTATACATTTTTGATACAGTTAGATTGTTTTTGTC 
ACATTCTTCATTCTATCCTGGGATCCCCCAACCACCTAAGTGGATTTTTTGATAAT 
TTGCATGCTTTAAGGATAACTCTTCATTCTGNAAAGGGCTATGGGTTTTGGCAAAT 

3 0 GCAGAGTCATGTATCCAAGATTACAATATCGCACAGAAGAGTTTCATCACTATATA 

AAACTCACCAGTCTTCCTCCTATTCAACCATCTCCATGCCTTCTTCCCAGCCCTAA 
CTCCTTAAAACCACTCATATCTTTACTATTGCTATAGTATTGCCTCTTCCACCATG 
TCATATAAATGGAAACATACAGTATTAGTCTTCTCAAACTAGTTTCTTTTACCTAA 
CAACATGCATTTAAGATTCATAGTGTCTTTTAATGACTTGATAGATTATTTCTTTG 
3 5 TAGCTGAATAATATTGCATCTTATAGATGTAACCGTTTGTATATCCATATTTTCTC 
ACAGCCTATGACTTGNCTTTTGATTCTCTGAACAGGCCATTCACAAAGCAGAAGTT 
TTAATTTTTATAAAGCTAATGNATCAACTT 
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Sequence ID 515 

CCTGGATGACAGCATATCTGTTTATAGCTCAGTTTACTGAATACTTTAAGCCCACT 
GTTGAAACCTGCT 

Sequence ID - 518 nt: 502 

GATGCATGTCCAGCATAGGCAGGATTGCTCGGTGGTGAGAAGGTTAGGTCCGGCTC 

AGACTGAATAAGAAGAGATAAAATTTGCCTTAAAACTTACCTGGCAGTGGCTTTGC 

TGCACGGTCTGAAACCACCTGTTCCCACCCTCTTGACCGAAATTTCCTTGTGACAC 

AGAGAAGGGCAAAGGTCTGAGCCCAGAGTTGACGGAGGGAGTATTTCAGGGTTCAC 

TTCAGGGGCTCCCAAAGCGACAAGATCGTTAGGGAGAGAGGCCCAGGGTGGGGACT 

GGGAATTTAAGGAGAGCTGGGAACGGATCCCTTAGGTTCAGGAAGCTTCTGTGGAA 

GCTGCGAGGATGGCTTGGGCCGAAGGGTTGCTCTGCCCGCCGCGCTAGCTGTGAGC 

TGAGCAAAGCCCTGGGCTCACAGCACCCCAAAAGCCTGTGGCTTCAGTCCTGCGTC 

TGC^CCA-CACATTCAZy^GGATCGTTTTGTTTTGTTTTTAAAGAAAGGTGANAT 

Sequence ID 519 

CTGCGATNGAGTTTTGAGAGGAAGGANTAAAGTNCTCATCTCNGACGGTGAGAAAG 
ATCATNACTAAGGAAACGCAGGGTTGGAAGCAGTGCTGANTGTCCAGTTGAGTTTC 
ATGANCAAACATTTGCTGTGGGACCAGTTTTCATGGNGGTTTGTCATTTTGTCCAG 
CTGCCTGGAGCTGCTTGGTTGAAGGCACAGAATAATCAGGATTAATTGTTNAACTT 
GTATGAATTTCTTTATTTTAAAATAGGAATAATATCTGCCTTGGGAGCAAGTTGTA 
AGAGTTAACTGAAAGCTTNAGGAAAAACTTTCCCTTGCTATTTAAGTAGGGCTTTA 
CAAGTTACAATTCTATCACAGTTTTAAGATTATAAAC 

Sequence ID 521 

GCGGCGCANCTGCGGATCCANAAGGNCATAAACGANCNGAACCTGCCCAJ^NNCGTG 
TGATATCACCTTCTNAGATCCAGACNACCTCCTCAACTTCAAGCTGGTCATCTGTC 
CTGATGAGGGCTTCNACAAGAGTGGGAAGTTTGTCTCAAAAAA 

Sequence ID - 523 nt: 585 

GATTTACTGTGGGAATTTGCTCATGCAATTATGGAAACCTAGAAGTCCCATAATAT 

GCCATCTTCAAGCTGGAATCCCAGGAAAGCAGGTGGTGTAATTCTGAGATTGAAGT 

CTTGAGAACCGGGGGAGTCAATGGTGTAACTCCCAATCTAGGGCTTAAGGCCCAAG 

GACCAGGGCTGCTGGTGTGCAGATGCAAATCCTGGAGTTCAAAGGATTGAGAACCA 

GGAGCTCTGGTGTCTGAGGGCAGTAGAAGATGGATGTTCCAGCTCAAGAAGGGAAA 

GTAAGAATCCGTCCTTCCTCCACTTTTTTGTTCTATTCAGATGAGCCCTCAATGGA 

CTGAACGATGCTCACCCACACTGTGAGGGCTGGTCTTCTTTATTCAATCCACTGAC 
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TTAAGTGCTGATCTCTTCTGGAAACACCTTCACAGACACACCO^GAAATAATGTTC 

TACCAGCCATGGGCCTGTTACTTAGCCCAGTCAAGTTGACACAGAAAATTAGCTAT 

CACAACATCTGTGTGTGTATATACATATGTATTTGCATGTGTGTGTATATATGGNG 
TATATATATTCATGTGTGTGTATAT 

Sequence ID 524 

CTTTTGCCAGTAGGCCCCCTGAGTAGGTTCCTCTATCTTTTGGCATGACCCCAGAA 
GTCTTTGATAACTTCCTTGCTTTCTGATGTGACAAGACATCCAGGGCCAGATTGTC 
CATATCCTGCCCCGGATGCACGATGCACTGTTTCTCCAAGAATCCCTGTGTCCTTT 
GCTGATGATGCCATGATTTTAAGTTCTCTAATATAGTTTTATCTCTTTGTTTCAGA 
TAATGCTTTTGTGTTCTCACATGTCCTGCTCTCTCTCTCTCTCTCATTTTGGTGTT 

AAATTTACTCTTCTTGACTAGTATCCTGTC^CTTCTGAGGACT(^TATTTTTGC^ 

CTTGAAAATTATTCTTATTTATTTAAGTATATGTTNCTGAAACTCTCATTAGACAC 
ATTTTG 

Sequence ID 525 

GTTAAAAAAAGTAAAAGGAACTCGGCAAATCTTACCCCGCCTGTTTACCAAAAACA 

TCACCTGGTAGGATCACCAGTATTAGAGGCACCGCCTGCCCAGTGACACATGTTTA 

ACGGCCGCGGTACCCTAACCGTGGAAAGGTAGCATAATCACTTGTTCCTTAAATAG 

GGACCTGTATGAATGGCTCCACNAGGGTTCANCTGTCTCTTACTTTTAACCAGTGA 

AATTGACCTGCCCGTGAAGAGGCGGGCATAACACAGCTGAAAAAAAAAAAAAAAAA 

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^ 

AAAAAATTTT 

Sequence ID - 526 nt : 516 

CTTTTCATGGTCTCTTGTTCATTAATCATCTAAAATCCAAGCNCAGAGAATTCAAT 

TTTAGATGGTCTCCAGAGCAGAATTTGATGTATAATCTTAATTACAAATCATAGAT 

AATTAATATTGNTTACAAAATC2\NAATACGATTAGAGGTAGGGATCCTGCACACAC 

CCTATTTTCCTCCGCAGTGTTCTGACCGAGAGACTAATTAATAATTCAAGGAACTT 

ACAGTGAATGANAACCCATGGTTTTGCTTAATTATCAGAACAGCTAGATCTGAGAA 

CAGCTGTCTCCCACATGGATAGACACTTATTCCACCCATTTGCAGGTAGAATAGCT 

GGCAATAATAAGTCCTTCCCATTGGATATGTTGAAAGGTGCCTGCCATGGCATAGT 

TGCCACAAGAGAGGAAGAAATGGACACAAATGTAGGCTGTTTTCAGGGCANAGGGA 

AGGTGGGAGGAAACCAANTTGCTGGTTTTCACACACCCTCTGGGGAACACCCATGC 

ACCTATGANATG 
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Sequence ID 527 

GACAAAAGCTGAGAGAATTTTTTTCTTGAATATTTGCACTAAAAGATAGGTTAAAA 
TTCTTCAGGCTGAAGAGAGCATACCAGGTGGAGATTTGGATCTACAAAAAGGAAGG 
AAGATTTGGAAATGGATTTGGC^CCATTGACTC^TTTCC^GAACAAGAAAGCAG^ 
5 GACAGTTTTGGGAAGCTCAAGACACACTGCCCATGAGCAGCAATTTGGACCTCCTG 
CTGCATCCACTGTGCATCAAACACACACTGTACAGACAAAGACTCCGAGGAAAAGA 
AGTATAAAC&TGGACTAACACAGAGATGGGCAAACT 
CTGTTTATGTAGAATCCAAAGTAAGAATCTTTAACTTACACATAAACTT 

10 Sequence 529/ 660nt 

GACAGCAGAGCACACAAGCTTN^ 

ACCATCTCACTGTGTGTAAACATGACTTCCAAGCTGGCCGTGGCTCTCTTGGCAGC 
CTTCCTGATTTCTGCAGCTCTGTGTGAAGGTGCAGTTTTGCCAAGGAGTGCTAAAG 
AACTTAGATGTCAGTGCATAAAGACATACTC^ 

15 AAA.GAACTGAGAGTGATTGAGAGTGK3ACCACACTGCGCCAACACAGAAATTATTGT 
AAAGCTTTCTGATGGAAGANAGCTCTGTCTGGACCCC^AGGAAAACTGGGTGCANA 
GGGTTGTGGANAAGTTTTTGAAGAGGGCTGAGAATTCATAAAAAAATTCATTCTCT 
GTGGTATCCAAGAATCAGTGAAGATGCCAGTGA^ 
CACTTGkTGTATTGTGTGGGT 

2 0 TGGTTAAATTTGAATTTCAGTAAACAATGAATAGTTTTTCATTGTACCATGAAATA 
TCCAGAACATACTTATATGTAAAGTATTATTTATTTGAATCTACAAAAAACAACAA 
ATAATTTTTAGATATAAGGATTTTCCTGGATATTGCACGGGAGA 



Sequence ID 529 

2 5 GACAGCAGAGCACACAAGCTTNTAGGACAAGAGCCAGGAAGAAACCACCGGAAGGA 

ACCATCTCACTGTGTGTAAACATGACTTCCAAGCTGGCCGTGGCTCTCTTGGCAGC 
CTTCCTGATTTCTGCAGCTCTGTGTGAAGGTGCAGTTTTGCCAAGGAGTGCTAAAG 
AACTTAGATGTCAGTGCATAAAGACATACTCCAAACCTTTCCACCCCAAATTTATC 
AAAGAACTGAGAGTGATTGAGAGTGGACCACACTGCGCCAACACAGAAATTATTGT 

3 0 AAAGCTTTCTGATGGAAGANAGCTCTGTCTGGACCCCAAGGAAAACTGGGTGCANA 

GGGTTGTGGANAAGTTTTTGAAGAGGGCTGAGAATTCATAAAAAAATTCATTCTCT 
GTGGTATCC^AGAATCAGTGAAGATGCCAGTGAAACTTCAAGCAAATCTACTTCAA 
CACTTCATGTATTGTGTGGGTCTGTTGTAGGGTTGCCAGATGCAATACAAGATTCC 
TGGTTAAATTTGAATTTCAGTAAACAATGAATAGTTTTTCATTGT 

35 

Sequence ID - 530 nt : 660 

GACAGCAGAGCACACAAGCTTNTAGGACAAGAGCCAGGAAGAA^ 
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ACCATCTCACTGTGTGTAAACATGACTTCCAAGCTGGCCGTGGCTCTCTTGGCAGC 
CTTCCTGATTTCTGCAGCTCTGTGTGAAGGTGCAGTTTTGCCAAGGAGTGCTAAAG 
AACTTAGATGTCAGTGCATAAAGACATACTCCAAACCTTTCCACCCCAAATTTATC 
AAAGAACTGAGAGTGATTGAGAGTGGACCACACTGCGCCAACACAGAAATTATTGT 
5 AAAGCTTTCTGATGGAAGANAGCTCTGTCTGGACCCCAAGGAAAACTGGGTGCANA 
GGGTTGTGGANAAGTTTTTGAAGAGGGCTGAGAATTCATAAAAAAATTCATTCTCT 
GTGGTATCCAAGAATCAGTGAAGATGCCAGTGAAACTTCAAGCAAATCTACTTCAA 
CACTTCATGTATTGTGTGGGTCTGTTGTAGGGTTGCCAGATGCAATACAAGATTCC 
TGGTTAAATTTGAATTTCAGTAAACAATGAATAGTTTTTCATTGTACCATGAAATA 
1 0 TCCAGAACATACTTATATGTAAAGTATTATTTATTTGAATCTACAAAAAACAACAA 
ATAATTTTTAGATATAAGGATTTTCCTGGATATTGCACGGGAGA 

Sequence ID 532 

GAATTGTGATAGTTCAGCTTGAATGTCTCTTAGAGGGTGGGCTTTTGTTGATGAGG 
1 5 GAGGGGAAACTTTTTTTTTTTCTATAGACTTTTTTCANATAACATCTTCTGAGTCA 
TAACC^GCCTGGCAGTATGATGGCCTANATGCAGAGAAAACAGCTCCTTGGTG 
TGATAAGTAAAGGCAGAAAAGATTATATGTC^^ 

ACCCTGAGATTCTTACTACTGATGAGAACATTATCTGCATATGCCAAAAAATTTTA 
AGCAAATGAAAGCTACCAATTTAAAGTTACGGAATCTACCATTTTAAAGTTAATTG 

2 0 CTTGTCAAGCTATAACCACAAAAATA 

AATGTCCATCTCAAAATACTGCTTTTACAAAAGGAGAATAA?^AGCGAAAAGAAATG 
AAAATGTTACACTACATTAATCCTGGAATAAAAGAAGCCGAAATAAATGAGAGATG 
AGTTGGGATCAAGTGGGATTGANGANGCTGTGCTGTGT 

25 Sequence ID 533 

CTTGAACCTCGGAGGCAGAGGTTGCAGTGAGCCGAGATCACGCCACTGCACTCCAG 

CCTCGGGGACAGAGCAAGACTCCATCTCAAAACAC^ 

CACACAGACACACAAAACAGATATA 

ATTTAAAAGGGTTAGAGATGTAAAATGGATCTAGGAATGGAAACCATAAGGNGGGA 

3 0 TTTATCAACTGGATTCTGCANAATGCTGTTAAGGCCAGATGTTAGCAGGTGTTACA 

TAAAAAAGGGATACCATGAGCAAAAGTATTTGAAGATGGGCAATGGTTGAAAC 

TTTAAACAGATTATNTTTATTACCAAATCTCTCAAACCTTTAATATGCTATAAACA 

TTGTGAAACAATAAAAAAACTTTCCAAAA 

35 Sequence ID 534 

GGGAAGGGAGCTATGAGTGTGTGTGTTGTGTATGGACTCACTCCCAGGTTCACCTG 

GCCACAGGTGCACCCTTCCCACACCCTTTACATTCCCCAGAGCCAAGGGAGTTTAA 
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GTTTGCAGTTACAGGCCAGTTCTCCAGCTCTCC^ 

GCAGGCCTGCTTGCAGGAAATGAATCCAGCAGCCAACTCGAATCCCCCTAGGGCTC 
AGGCACTGAGGGCCTGGGGACAGTGGAGCATATGGGTGGGAGACAGATGGAGGGTA 
CCCTATTTACAACTGAGTCAGCCAAGCCACTGATGGGAATATACAGATTTAGGTGC 
5 TAAACCGTTTATTTTCCACGGATGAGTCACAATCTGAAGAATCAAACTTCCATCCT 
GAAAATCTATATGTTTCAAAACCACTTGCCATCCTGTTAGATTGCCAGTTCCTGGG 
ACCAGGCCTCANACTGTGAAAGTA 

Sequence ID 560 

10 GGCGGAGGTTGCAGTGAGCTGAGATGGCGCCATTGCTCTCCCAGCCTGGGTGACAA 
GAGCAAAACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAGC^ 
ATACAI^CACAGAGACAAGTATTTTTGAGAAACAAATACCTTTTTCATTTT 
CCAATGTAACAATAATCCATTAAACACACCTTTAC 

ATATGATGAGGAAATAGGTAAACCTTTAATAGCCAGTACTAAATTAGAGTGGCACA 
1 5 ACTTTCACTGGGAAAAAAGATGGGTATTTTACTTTTCTGTTTTAGAAAAGTGGCTT 
GACAACAGTATGCTTATGTCTTAGAGTTTGAAATTGZ^GTTCTTGAACATTATTAA 
TGGCTACAATCATTCATACCCACATTGGGCTGTATTCTTGATGAATCCA 
. TTTCACCTCAACTCTGAATTTCA^ 
CTAGAGGAAGCATTTCAGTCTTTTCTGATTGGAGATTCATTATTGTTTTAGATAAT 

2 0 GTTTTCATTTGCTTATGGGTATATAAAAAATTTTATCTTAAAAATATTTCCTCTCA 

TTTAGCTAGCAACATTGTTTTC 

Sequence ID 561 

CTCAGGGTGATCTCTGAACCC^AACTTGCCCCAAAGAAGGTTGCTCTGTCCTCTCC 
25 ACATCCCCATCTCCTCCCTAGGGCCTTGTTGGGGAGAGGCTCCTCCATCTTTCCCA 
AGTC^CACCATCGTTTCCTACGTGGTCTGGACAAGAGCAAGAGCACACCTTGTCCC 
CACCTTCTCCAGAGCAGCCAGAACCCACCTCAGGTGCCTTCCCCATCCGGTGCAGT 
TAAGGCACTTCTGCCAGCACCATGGTATGAGCACTAGACTTGGAGTTAAGATTTGA 
GAGCCCCCTCTGTCACTGTGGAAGCTTGAGCATGTTGCTTGATCTCTCTGAACCTT 

3 0 GTGTTTCTCATCTGTGAAAGGTGATAATGTGGGGCTGCTGTGAGATTTAAAGGACA 

TAATGCACCTACGGTCCAAGCACTGCCTGGAAT^ 

GGAO^CCC^TCCCCTTAGTAGAGGC^CTAACCATGTGACCCAAGGCAAAAGTGCT 
TAAAAAAA 

35 Sequence ID - 562 nt : 580 

ATTGCATGCAAGTTTGCTGAGCTGAAGGAAAAGATTGATCGCCGTTCTGGTAAAAA 
GCTGGAAGATGGCCCTAAATTCTTGAAGTCTGGTGATGCTGCCATTGTTGATATGG 
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TTCCTGGCT^AGCCCATGTGTGTTGAGAGCTTCTCAGACTATCCACCTTTGGGTCGC 
TTTGCTGTTCGTGATATGAGACAGACAGTTGCGGTGGGTGTCATCAAAGqACTGGA 
CAAGAAGGCTGCTGGAGCTGGCAAGGTCACCAAGTCTGCCGAGAAAGCTCAGAAGG 
CTAAATGAATATTATCCCTAATACCTGCCACCCCACTCTTAATCAGTGGTGGAAGA 
5 ACGGTCTCAGAACTGTTTGTTTGAATTGGCCATTTAAGTTTAGTAGTAAAAGACTG 
GTTAATGATAACAATGCATCGTAAAACCTTCAGAAGGAAAGGAGAATGTTTTGTGG 
ACCACTTTGGTTTTCTTTTTTGCGTGTGGCAGTTTTAAGTTATTAGTTTTTAAAAT 
CAGTACTTTTTAATGGAAACAACTTGACCAAAAATTTGTCACAGAATTTTGAGACC 
CATTAAAAAAGTTAAATGAG 

10 

Sequence ID 563 

GCAACCTGCACAACCCCGCCCTGTTGGAGGGCCGGAGCCCTGGCGTGTGGGAGCTG 
GCCGAGGAGTATCTGGACATCGTGCGGGAGCACCCCTGCCCCCTGTCCTACGTCCG 
GGCCCACCTCTTCAAGCTGTGGCACCACACGCTGCAGGTGCACCAGGAGCTGCGAG 

1 5 AGGAGCTGGCCAAGGTGAANACCCTGGAGGGCATCGCTGCTGTGAGCCAGGAGCTG 
AAGCTGCGGTGTCAGGAGGAGATATCCAGGCAGGAGGGAGCGAAGCCCACCGGCGA 
CTTGCCCTTCCACTGGATCTGCCAGCCCTACATCCGGCCGGGGCCCAGGGAGGGGA 
GCAAGGAGAAGGCAGGTGCGCGCAGCAAGCGGGCCCTGGAGGAAGAGGAGGGTGGC 
ACGGAGGTCCTGTCCAAGAACAAGCAAAAGAAGCAGCTGAGGAACCCCCACAAGAC 

2 0 CTTCGACCCCTCTCTGAACCAAAATATGCAAAGTGTGACCAGTGTGGAAACCCAAA 
GGGCAACAGATGTGTGTTCAGCCTGTGCCGCGGNTTG 

Sequence ID - 564 nt : 671 

2 5 ACATAAGATAAACAAAAACCTTACCACCAAACATACCAAAATGCACCTCTTTCATA 

AGTGAGTTACTAAGATTTCTATACCTGGAATATCATGTATGTTTCATTTACTGGAT 
GTTTACATTTTAGGAAGGAAAATAGTTTTGTTTATTTAAACAACTGAATACTTATA 
AACTGTTGTTCCTGGAAGTTATTTATTCCATAAAAAATTTGTTCTTTTGTCATGAA 
TTTATAATTCCTAAATGAAGACCAGAAAGTACAAATTGCTGGGAGGAAGAATAGGC 

3 0 TTTATTAATCAACTGATGTCTTGATTTTTCTAAATGGGAAGATTGCTTTATTTTTA 

ACACTAATTATGGGAGCAGATTCTTAGCAAACTTCTTTGGAAAAGTTAATGTTATG 
ATGTGCATTAGGCTGCCCCATCGTGTATATAAATGAAGCAGATTTGATTTTTGTAT 
TCTTACGTTTCTCTGCTTTGTAGTTGTGGCTGTACTTAAAGAAATACAGAATTTCA 
TATATTTAAAAATGTTTAAAATGTGACCCACAGACATTGTAAATGGATTNAAAACT 
3 5 AACATGAAAAATATTCAACCTAAAAGAATTCTTAACTTCACAAGTGTTTTACTTC 

Sequence ID 565 
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CTTGGTTCCGCGTTCCCTGCACAAAATGCCCGGCGAAGCCACAGAAACCGTCCCTG 
CTACAGAGCAGGAGTTGCCGCAGCCCCAGGCTGAGACAGGGTCTGGAACAGAATCT 
GAGAGTGATGAATCAGTACCAGAGCTTGAAGAACAGGATTCCACCCAGGCAACCAC 
ACAACAAGCCCAGCTGGCGGCAGC^GCTGAAATCGATG^ 
5 CAAAACAGAGTCGGAGTGAAAAGAAGGCACGGAAGGCTATGTCCAAACTGGGTCTT 
CGGCAGGTTACAGGAGTTACTAGAGTCACTATCCGGAAATCTAAGAATATCCTCTT 
TGTCATCACAAAACCAGATGTCTACAAGAGCCCTGCTTCAGATACTTACATAGTTT 
TTGGGGAAGCCAAGATCGAAGATTTATCCCAGCAAGCACAACTAGCAGCTGCTGAG 
AAATTCAA&GTTCAAGGTGAAGCTGTC 
1 0 AACTGTACAAGAGGAGAGTGAAGAGGAAGAGGTCGATGAAACAGGTGTAGAAGTTA 
AGGACATAGAATTTGGTCATTGTCACAAAGCAAATGTGT CGAGAGCA 

Sequence ID 566 

GTCACCAAGAGCTTGTTGTCAGGTTTTCACTTGCTATTCGCAGAGATTTTTTTTAA 
1 5 AGGCACTATTTGTAGTGTTAAAAGGGTGAATTTATCANAAGGCATAATAATCATAA 
ATGTGTATATGCCTAATAATAGAACTTTAAAAGGCATGAAGCAACACTCAAAAGGA 
TTAAAGGGAGATCATCTCACCCCCTTCTTACCAATTGATAGAATGATCTGATGAAA 
ACAGTAAAATAACAACAGATCTGAACACTGTCAACCATCTTGACJ^AATACTTATGC 
CTAGTGTTCCATTATTGGAACACTAAACATGTGGAATGATTTATATCCTACTGCTC 

2 0 AAGGTCATC^.CCAAGGTCTAATTGTAAAATTTCAAAAAATTGCAACCTCAGGCATA 

AATGGGTTAATCGACATTTATAGCACACACATGCAACATGTACCAGAGATTCCTTC 
TTTTCTATGAACATGGTACTTCCACCAAGATAGACCACATTGTGAACTATAAAACA 
AATCTAAAAACATTTGAAATGAAGGAAATTATATAAAATATGTTCTCTTGATCTCA 
ATGAAATTAAATTAATACTATAT 

25 

Sequence ID 567 

CTCATGGCGGCCAATGTAGGCCCAAAACTTCCTCAAGTCAAACTCTCCAGGCCCAC 
CTTCTGCTTCCCGGTGGCATCAAGAGGCCCAGCTTTGACTTGAGAACAGCCTCTGC 
AGGCCCTGCTCTTGCCTCCCAGGGGCTTTTTCCAGGCCCAGCTCTTGCCTCATGGC 

3 0 AGCTGCCCCAGGCCAAATTTCTGCCTGCCTGCCAGCAGCCTCAACAGGCACAGCTC 

CTCCCTCACAGTGGCCCATTTAGGCCCAACTCATGACTGTGAGGCCATTTCCAGGC 
CTAGTGCCTGCCTCGTGGCTGACTCTTGAAGCCCAAAACTTCCTCAAATCAGCCTT 
TTGCCCAACTTCTGTCTACTGTCGGACTCTACAGGTCAGCCTCTGCCTCACAGTGG 
ACCCTCCAGACCCAGATGGTGTCTNCTGTGGCATCCTCAGGCGAAGCTCCTGCCTT 
35 TCGGCAGCCTCTCCAGGCCCAGCTCCTCCTGCTCCAGCCTTCTCTCCAGGCTCTGA 
ACTTTCTCAGGTCTCCCTCTGTTGTCCAAGGCTGGAGTGTAGTAG 
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Sequence ID 568 

TATATATGTAATGCCCTTAACCTAGTGTTTGGCATGATCGTTGCTGAAAGGGAAGC 
TTGTGGGTACAGTGTCCCCTCAGAAGCCAAAGCCCAGGGAAGGTCGCCTGCCCAGG 
TCAGGCTCCCAGCGAGTTTGTCTGGGGAGGGGCCATTCATACCTCCAGGTCAGGAC 
5 AGAGGCTCGGGCTGAGGGAACCCTACACAGGTCCTGGAAGCAGATCCTTCCTGCCT 
AAGCCAGCAGGACAGCTCAACAGGAAGCATCTTCCAGCCACGGGAGGAGAGGCAGC 
ACCTTTTTTGGAACCATACAGAGCTAAGAATGGTGGTACAAGTAATAGATTCTGTA 
CTGGCAACCCCACTTGGTGGAGCAAGTTCTAGGAAAAGGGGGCTGTCCTTGAGTCA 
GCCATGGGGTCAGCCACACAGTCACCGCAGCTGCTCTTTGGCACCGGGCGCTGGAA 
1 0 AGACCTAGGATGACACAGCCTGGAAAGAGCTTGGGAAAAGCTCATCTTCCACAGAA 
CTACCTGCTATACCAGCCAGGGCAGGTGCTTATTCCCACAACAGCCCTCTGTTGTA 
GGCGGCAGTGCCATCCTGAANGTGCCGTGGTACCTTCTGAANACCCAGCTGAGGGC 
CTGTAATGGCAgTTGCATGCCACATGGNACACCCTTTCCCGGTTAA 

15 Sequence ID 570 

ACCGCGGCCGCGTNAANAAAAAAAAAAAAAGAATTCCACTTGATCAACTTAATTCC 
TTNTCTTTATCTTCCCTCCCTCACTTCCCTTTTCTCCCACCCTCTTTTCCAAGCTG 
TTTCGCTTTGCAATATATTACTGGTAATGAGTTGCAGGATAATGCAGTCATAACTT 
GTTTTCTCCTAAGTATTTGAGTTCAAAACTCCTGTATCTAAAGAAATACGGTTGGG 

20 GTCATTAA.TAAAGAAAATCTTTCTATCTTAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^^ 
AAA 

Sequence ID - 571 nt : 457 

25 TTAGAGAGGTGAGGATCTGGTATTTCCTGGACTAAATTCCCCTTGGGGAAGACGAA 
GGGATGCTGCAGTTCCAAAAGAGAAGGACTCTTCCAGAGTCATCTACCTGAGTCCC 
AAAGCTCCCTGTCCTGAAAGCCACAGACAATATGGTCCCAAATGACTGACTGCACC 
TTCTGTGCCTCAGCCGTTCTTGACATCAAGAATCTTCTGTTCCACATCCACACAGC 
CAATACAATTAGTCAAACCACTGTTATTAACAGATGTAGCAACATGAGAAACGCTT 
3 0 ATGTTACAGGTTACATGAGAGCAATCATGTAAGTCTATATGACTTCAGAAATGTTA 
AAATAGACTAACCTCTAACAACAA^TTAAAAGTGATTGTTTCAAGGTGATGCAATT 
ATTGATGACCTATTTTATTTTTCTATAATGATCATATATTACCTTTGTAATAAAAC 
ATTTTTCCC 

35 Sequence ID 572 

CGTCTATTTGNGTTTCTTCTdACAATTGGTAAGTTCTCTGTATTGATTGATGGCTA 
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AGTTTGATTAGTGTTTTTCTCTAGTTGGTAATTATATTCTAGTATTTTATCATCTT 
ATTGTTTACTCAACTNAAAGTGNCACAGAAGAGTTGCCAGGTTTCTCTTTGATATG 
AGATCTCTNOTTGATTTGGAATGCAAATCANAAGTGTC^TGTTTTGAATAAAGGGA 
CCAGATGACTTATAGGTATTCTTTCTCTAAATATAACTAAGGTAAGATTTTTGTTT 
F TGAGGTACTTAATCTATATAAGTGGTAAAGAATTTACTTGAATTTCTCCAAATTCT 
CATGTCTAAAGTCTGATTGATTAAATTCATTCTTGGTATTTCATTTTGAAAAGAAT 
GTAGCTTTAGCAAACCTCTTTGTATAAATGCAGTGGGATTAAGGTCATTTAAAAAA 
TTGTTATATCATTGTATTTTTAAAATTTACCAGTTTTATTTTTCTTTTTACCCTTT 
AGCCCGGCCTCAGAAAGTGTGTTTGTGTCCATTTCTCCCAGCGCACCCTCTGCATA 
10 TCTCTACCCACTTGTCATAATTCAGCAT 

CAGTTCCTCTACTAGCAGCATGCCTCCCCCAGGACAAGTGTA 

Sequence ID 574 

TTATTGCTGACATAAAAATGGTGCAC^TCGGCCAGGGCCCAGGATGAATC^GCCAA 
15 TCTGCACCATTTATACATGGAACTGGAGAACATTGTGCCAATAATCATTTAATATA 
TGCCAAATCTTACACGTCTACTCTAAACTGCTCTAATGAAGTTTCAGTGACCTTGA 
GGGCTAAAGATTGTTCTTCTGGGTAAGAGCTCTTGGGCTGGTTTTTCANAGCAGAG 
TTCTTGTTGTGGGTAGACTGTGACTAGGTTCACAGCCTTTGTGGAACATTCCGTAT 
AACGGCATTGTGGAAGCAATAACTAGTTCCTATGAAAGAACCAGAGCTGGGAAGAT 
2 0 GGCTGGGAAGCCAGGCCAAAGTGGGGGCAACAGCTTGCTTCTCTTTCTCTTCTCAC 
CCTCAGTTTGTATGGGAAAATGGAGATGTCCTCTCCACTTTATCCCACGATATCTA 
AATG 

Sequence ID - 575 nt : 209 

2 5 CAGGATATCGAGACCATCCCAGACAGCATGGTGAAACTCCGTCTCTACTGGAATAC 

AAAAAGTTAGCCGTGTGTGGTGGCACGCGCCTCTAATCCCAGCTATTCGGGAGGCT 
TAGGCAGGAGAATTACTTGAACCCGGGAGGCGAAGGTTGCAGTGAGCTGAGATCGC 
ACCATTGCACTCCACCCTGG- CGACAGAGCAAGACTCCGTCT 

30 Sequence ID - 576 nt : 541 

CAGCCAACCCAGAAGGAGCCAGTCTACAACTATGCCTGATCCTCCTCATGGCAGGC 
CACGAAGCATTGCTGCCATGTGTTGAATTATAAAACCCACATTGCTTTTTGAACCC 
TGTTGCGGGTAAAAATAACCAAATTATCAGTCCTTGGAAACCCAGGCAATCAAGTG 
AGTACAAGGTAAAGATAAGTATGGTTTAGAGGAGAAATTATGTTCCTGAACTGGTG 

3 5 TCCTTTGATGGCAGCGTCAGCCTTGCTAAGTCAGAGTAGAGGGAGCAGTGACCTTA 

ATAAGCTTTGGTGAGCATCATGTGCACGCGTGGGTGGGAGTCCCTTTCACTGATGC 
TTTTAJU^GTGCTTTTGCAGACCCTGGAAGGGATCCTC(^CACATATGAGGTGTG^ 
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GACAGGTAGGCCAGAGAGGATTAGCCCTGCTTTCGAGACTAGAAATCTACAGTCCT 
GAAGGAGCAGTAATTAATTGGTACACCTGTCAGGGCCAGCCCCCAGGTCTCCTGGC 
TTTTTCCAGGTTTTCTGTCTCACATGATTTTGCTTTT 

5 Sequence ID 577 

CTTTAATTTTTCAAGTGTTTAAAAAACAATTTTATACTTAAGCCAGCCTTGAAGAT 
AAGCACAAAATTTACC^GTTTACATTTAAAAAACAAACAAAAAACGACAACAA^ 
AAGCACCCGCTCTGTGCATAGCACTATTCTAGGTGCAATAAAAGGGAATCTTAACC 
TTAGAAATATGAGTTCACTTTCTGGAATTGTATTATCTCCTTTTCCAGAGAGTAAA 

10 AATAAATAAAATCACCATTGTTTACTACAGATCTGCCCCAAACCACATGTGGTTCA 
CAGAAAGGCTAATTTCTGCCAAATTAAAGATGTAATGAACTCAGTTCCTGCTTTCC 
GAAAAACACGAAAGC^GAATTCCTTTTCACTGAAAAAAATAAACAGTTTTCCATGC. 
AAGGGCA.GTTTGCTTCTAATAAGTATTTTTTAAAAAATTTTTTTTTCCTCTAGCTT 
TTCTTTAAATTTTCTTCCTCTAATATTGCCTTTTCTTGTACAAGGCAGACCAGGTA 

15 TCTTTTTATGCTGTTTTTCCTTTACTAAGAAAAGTATTGCATCTTGAAGACAAACC 
ATTTCCCAGAGTAGTGATAAAAAATAACACTAAAAAAACTTTAAAGGTGAGTCACT 
TCATCACCTTGATGAAGTAAAAAA 

Sequence ID 578 

2 0 GGAAAAAATATTTCCACTTAGATATTTTACATGGTTTTGTTTAAAATTACCATTAC 
TTGTTTTTTAAAAACACATGACCACATATGTATATGTATATCTACCTAAACATTGT 
ATCATGGTTTCAGTATGTTATTCATGTATTACTGGGAGATGCTACCAAGAAACCAA 
CCCAAAGAAAATTCTGGAAAATACATTTCTATTTATAGAATAAATGTTTCATTTAT 
ATAAAAGCAAAAGAACTTAGAGTTCTAATAAATGGGATGTCTAATAAATTATGAAG 

2 5 TTACTGATTTGAATATATTATATTTTTATAACTTCCTTGCCAAAGTCCTGATTTAG 

TACATTAGAGAACCTGTGTTTCCTCTCTCCTCTACCATTCATCTCTCTTCCATACA 
GTCATTTGGGCTTTTTACTCAAAGAGAATCAAGAAATAATAAGGTATAACAAGCTT 
GGCAAAGTGTTGGCTTTTTAAAAAAAAATTTTTTTAATCTCTAGCAGTTTGGTAAT 
TTAGCAGCATCATTTATTTGGGATTCTTTTATCTGATTTCAACAGTGAAAAACATC 

3 0 CCTATGATAAAGCCTAATGACCCATTTCCAAAAGATGGAATTGCCCTTCCTAGAAA 

ATATGACGGAGAAAAGT 

Sequence ID - 579 nt : 502 

CGAATAGCCAAGTGGTCTGACAAGATCGAGAGTAATGAGGCCCATACTTTAGTACA 
3 5 GTCTTGAATGGCCAGATGGTGCTGGGCATACCCCAACCAGAGATATGTAAGTCTTT 
ATGTTGTCAAAATTTCCCAGAAACATGAATTTCCCACTAAGATTCATTAAGGAAAA 
CTAGAATGAAAACAAAAACGTTCCTTGTATAATATTCATTANAAAGAAATGAAGAA 
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GGCCGGGCATGGTGGCTCACGCCTGTAATCCCAGCACTTTGAGAGGCCAAGGTAGG 
CAGATCATGAGGTCAGGAGTTTGAGACCAGCCTGGCCAACATAGTGAAATCCCGTC 
TCTACCAAAAATACAAAAAAATTAGCCGGGC^TGGTGGCACACACCTGTCATCCC^ 
GCTACTCAGGAGGCTGAGGCAGGAGAATTGCTTGAACCTGGGAGGTGGAGGTTGCA 
5 GTGAGCTGAGATTGCACCACTGTACTACAGCCTAGGTGACAGTGCAAGACTCTG 

Sequence ID - 580 nt : 316 

CCTATGCCAAACTAAAGAAAGCTTGCCTGGCCTACAGGCCTAAAGGTTCAAATGNG 
GATTAAAAAAACACAGTAGTC^CATAAAAT^ 
1 0 ACCTACAATTTACCTGCTTTCAAAAACTGTGTTCAACATTGAGAAAACAGAAAACC 
ACTTATCTTGAGCTTAATATGGGCTTCTTTTTCCTTAACTGTAGAACACTTACTGA . 
AATATCAAATO^TGGTTAGGATATGTATCCTAGGC^GGCCTAAACC^TTAACACT 
TGGTTTAAGCAACTTTGTATAATTNACCTCCTAAAT 

15 Sequence ID 581 

CTTCATGAGTGCCCGGTTGCCCAAGTCAAAAACCTGGGAGTGATATAAACTCCCC1A 
CACATCCAGTCAGTCACTCATCAACTCTATTGATTCTG^CTGCT^ 
ATTGTATTAACTTAAACATATGCATAATACATCTTCTTCTTCACTGCATTTTTGTG 
GGCTGCACTTACCTTTCAGGTAACAACAACACTGGCCCCTCTTGCCCTTCTAGTCA 

2 0 GAAGTGCC^AAATGATGAGAGCTAGCCATGACAAACC 
AATGTGCAAAACTGGAAGGGCATCCAAACAGAGGAGG 

Sequence ID 582 

TAGAATTCTCGCCTGCCTTGGCTTCTCCCTCTAGTTGTTCCTTCTCTGTCTTCTGT 

2 5 . GGGCTTCTTATTGTCTGCTCACTCCTTCTTCAGTGTCCTCTCATGGGCTTCCTTCC 

CTTCTCAGCTGATGCCATCACCTGGGGAATCACAGTTACTCAGCAGCACTGGGGCC 
TCTCTATCTCTATGCTGGTCATGCCTATGTGTGAGCTGCAGACCCAGTGGAATTTC 
CATTTGTGCATCCCATGCCCAGCCCACCCTCCACCAGCCTCGAATGCAGCTGTTCA 
GCCCTACCCCAGTCCTCAGAAAAGTTCCTCTCCCTGGATCCTCTTTTTCCTTCATG 

3 0 AGTGCCCGGTTGCCCAAGTCAAAAACCTGGGAGTGATATAAACTCCCCACACATCC 

AGTCAGTCACTCATCAACTCTATTGATTCTGTCTGCTAAATATATCTCAATTGTAT 
TAACTTAAACATATGCATAATACATCTTCTTCTTCACTGCATTTTTGTGGGCTGCA 
CTTACCTTTCAGGTAACAACAACACTGGCCCCTCTTGCCCTTCTAGTCAGAAGTGC 
CAAAATGATGAGAGCTAGCCATGACAAACCCACAGCCAACATTACACTGAATGTGC 
3 5 AAAACTGGAAGGGCATC C AAACAGAGGA 

Sequence. ID - 583 nt : 631 
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CTGAGGTGGGAGGATTCCACTCTCACCCATTTCTTCTTTCATTTTCAGTTTCTCCA 
GTTAGTAACTGAAGATGTTCTTTGAGTAATTAAGTGAGTGAGAAAATTTTTAAGTG 
AGAAATCTATAAAAAGAACCATGTTAACATAAATATTTCAGTCCTTACAAGTTGGT 
ATTGACTTTTCTCATTGGTAATCTGACTGATTTAATACTGCTCATTCCAATATCTG 
5 GTGATGTAATTCTGGTTATGAATCCTTGTATTAATAACACCTCCTGGGAGGTTTTT 

CAGGAGGAGGGAGCTTATGTTTAATGTGGTGGATAAAACTTAACTGCTGGTTAATA 
CAATTGTTATTCAGGTGAAATTCCCTAAACTTTTCACGTGCAAAGTTTTGTATGTA 
TACAGACATTTGGGGAAAAGTTTTATCATCCCTAAAACCGGTTACTGTCCAGAAAA 
1 0 TGATAAGAATCCCTGGGTTCCAAATCCTTCATAAGGTATTTATTCATTTATTTATT 
CAACACATTTACTCAATGCCTCCGCT 
AATCTAACCGAAAAT 

Sequence ID 585 

15 TTTCAAATTGTACAATAACACAAACAACTTTGTTAAGGCCATGTTTTATTTGCTGA 
TTAATGGACAAAAGGCAATGTAATTTATTTTCAAGTATTTTCTTGAAAGTCTGTGC 
TCATAAAAATCATGAAl^AGTTGGAAAGACTGTTAAATCACTGAAACTTCAAATATA 
TCTTACACAATCTTGTTTGTACAAAAATACAAGTTAAATATAAACATAAAGCAATC 
ATGGTAATTTTATGCAAATCTGTTTTATGTGATCATCAGTTATATATAAAAGTTTC 

2 0 TCAGTTCTGTTATTTGTGAAAAGATCAATACCAGATTGAATGACTACCTATTGGCA 
AAGGGCCCTAAAAAGCTTACTTTAGCACTCATCTTTTACATGGTTAAATGCATTTC 
CTAATTTGAGATCACCTAAACACTGGAAAAGAAAAAAAATGAAAGGGCAGTATGTC 
CATAAACCAACAAATAATTTGGCTGTAATGTATCATAAAACACAAACCCCACACAT 
' CTGTACAATAAACATTATGTATTACATACACACAACACACACCCAGTCATAAAGCC 

2 5 TAATGATGTGCTGCTTCCAGTTCAATATTCAGCTGTGCATTTTTTCTTATTTCATC 

AAATGAATAGCTTTTTGTCACC 

Sequence ID 586 

GTAAACTGTTCTCTCCGAGGGAAAAAATGGAAGTTATCCTCACAGTTCACTGCCGT 

3 0 GGTATTTCTTCTGTCCCATGCTTTGCATGACTGCCATGGTACAGCCTTGTTTCAAA 

CTGTTCACTGTGATCTGTGGGTCTTTGAGTTTCAGTGAGTTTGCTGAAATGTCGAA 
GAAGTAGTTCCAAACTTCAATGTTCAATGAAATTTTTGTTCAAGTTTGAAATGGAG 
" AGAGCAGCTTTAAAAGGTACTAAGCCTTTTACAAATTGGTGAGTACTGGCACATGA 
GAT 

35 

Sequence ID 587 
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TGACTCCAAGGNACACACAGTCCTGTATCTGGAACTACTGAGTGGCAGGCATCTTT 
CTCTGCCTCTGACAGTGGAGTCCCCATCACTGCAGAGCATAGCCA2VAGGAGTCAAA 
GGTCTCAGCGGGTCACTGCCTTATCAACCCTCACCAGTCCCTTATGTTTTTTAATA 
TTTTATAATCTTGACATGACACCAAGATGCTTTAATAAAAAAGCACCTCTAACTCG 
5 GTCTTGTATTCACTTACCTTGAGCCTGGGACTTCTCTAGGCTCCTGAGGCAAAAAC 
AGGTAGAGGGGAGATGGTGGAACATAAAACACAATTTTGCTTGGCACCCACCTTGG 
CGTCTGTCCCGATGACCAGGTCTTTCAATTCGATGATTTTGTCATTGATGGAGGAG 
CGATATCGTTTCTCAATGATATTATGGGTTGTCCGCCTTTCTCCTTCTTTGGGGGG 
CTCAAGCTGCTTGACTCCCCCAGGTACCTGCTTAATGGGGCACTTTCTCTTGCCCC 
1 0 ATC^TTACAGGCATTGTGGTC^GAATGGTCC(^CTGCTGCCCACCAGGGTCTA 

Sequence ID 588 

CTAGTCTTTTCATAGTCTGCATAGAGTCTQpCCATTACCATCAGTTTTTAAGATGT 
CCATATTGTGGCCGGGCGCGGTGGCTCACGCGTGGTAGTCCCAGCACTTTGGGAGG 
CTGAGGCAGGTGGATCATGAGGTCAGGAGATCGAGACCATCCTGGCTAACACGGTG 
AAACCCGTCTCTACTAAAAAAAATATTAAAAAATTGGCCAGGCCTGGTGGTGGGCG 
CCTGTGGTCCCGGCTGCTTGGGAGGCTGAGGCAGGANAATGGTGTGAACCCGGAAG 
TCGGAGGTTGCAGTGAGCCAAGATTGCACCTGGGCAACACAGCGAGACTC 
AAAAAAAAAAAAAA 

Sequence ID 589 

CAATTATTTATTACCTTTCCATTTGTTCGCCTGATGATGTGACAATGCATGGTCTT 
TGTGCATGCTGCTAGACACTTTTCTTTCCCAGCCGAAAAGTCTATTATGTAATTTT 
TACATTCATAATTTTAATGTGGATGATCAGGATTAAATCAAGATATATATCTGGAA 
CCTCTTATAAATGGAGCACTTAGAAATTTGTTGTTCTGCACTTAACCTAGAGAGAG 
AAAAAATGCTTTTCTTTGTGAAAAATCTGAATTCCTGTCCTGACCTTCTGTGATGT 
GGAAACCCTAGGCTCTGAGACACACTCTCTGGTGTCTGAGACAGAACCAAAGCAAT 
AACGTTGTGATGCCCACAGGCCTGGAGCGAGCTAGCGACCTTGTGCCGCCCAGCTG 
TCCATGGCCCGTGCAGAGCAGAGGACAGTGAGTGTCTGCACTGAGAACCTTAAACC 
ACAGTTGAACATACCCACACCTGTTTGTCTTAAGCTATAGTGTAAAAACAAAGTTT 
GGGCTCTGAAAATTTAACTGAAAAAGATTTCCTTGTT 

Sequence ID 590 

GTGGCAGCAGGCGCAGCCCAGCCTCGAAATGCAGAACGACGCCGGCGAGTTCGTGG 
3 5 ACCTGTACGTGCCGCGGAAATGCTCCGCTAGCAATCGCATCATCGGTGCCAAGGAC 
CACGCATCCATCCAGATGAACGTGGCCGAGGTTGAGAAGGTCACAGGCAGGTTTAA 
TGGCCAGTTTAAAACTTATGCTATCTGCGGGGCCATTCGTAGGATGGGTGAG^CAG 



15 



20 



25 



30 
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ATGATTCCATTCTCCGATTGGCCAAGGCCGATGGC^TCGTCTCAAAGAACTTTTG^ 
CTGGAGAGAATCACAGATGTGGAATATTTGTCATAAATAAATAATGAAAACCTAAA 

Sequence ID 591 

CAGCAGCAGAAATGTTTGCAAGATAGGCCAAAATGAGTACAAAAGGTCTGTCTTCC 
ATCAGACCCAGTGATGCTGCGACTCACACGCTTCAATTCAAGACCTGACCGCTAGT 
AGGGAGGTTTATTCANATCGCTGGCAGCCTCGGCTGAGCAGATGCACAGAGGGGAT 
CACTGTGCAGTGGGACCACCCTCACTGGCCTTCTGCAGCAGGGTTCTGGGATGTTT 
TCAGTGGTCAAAATACTCTGTTTAGAGCAAGGGCTCAGAAAACAGAAATACTGTCA 
TGGAGGTGCTGAACACAGGGAAGGTCTGGTACATATTGGAAATTATGAGCAGAACA 
AATACTCAACTAAATGCACAAAGTATAAAGTGTAGCCATGT 

Sequence ID 592 

tactc^tgaaaaaccatgataAttctttgtatataaaataaacatttgaaaaaaa 

AAAAAAA 

Sequence ID - 593 nt: 565 

CAGGATCAAGGTGAAAAGGAGAACCCCATGCGGGAACTTCGCATCCGCAAACTCTG 
TCTCAACATCTGTGTTGGGGAGAGTGGAGACAGACTGACGCGAGCAGCCAAGGTGT 
TGGAGCAGCTCACAGGGCAGACCCCTGTGTTTTCCAAAGCTAGATACACTGTCAGA 
TCCTTTGGCATCCGGAGAAATGAAAAGATTGCTGTCCACTGCACAGTTCGAGGGGC 
CAAGGCAGAAGAAATCTTGGAGAAGGGTCTAAAGGTGCGGGAGTATGAGTTAAGAA 
AAAACAACTTCTCAGATACTGGAAACTTTGGTTTTGGGATCCAGGAACACATCGAT 
CTGGGTATCAAATATGACCCAAGCATTGGTATCTACGGCCTGGACTTCTATGTGGT 

gctgggtaggccaggtttcagcatcgcagacaagaagcgcaggacaggctgcattg 
gggccaaacacagaatcagcaaagaggaggccatgcgctggttccagcagaagtat 
gatgggatcatccttcctggcaaataaattcccgtttctatccaaaagagcaataa 

AAAGT 

0 Sequence ID 594 

CAGAAGAGTAAGCAAATCTCAAAGCAGCGAAAGGGAAGAAACTAAAAAAGGTAGAG 

CAGAAATAAGAGAAAATAGAGAAGAGAACAATTGAGAAAAATAATTGAAACCAAAA 

GGTGGTTCTTTGAAAAGCCTAACAAAATGGACACATCTTTAGTTAGAGTGACCAAG 

AAAAAAGGGCAGTGACTCAGATTACTTCATTCAAGAGTGAAAGAGGGCACATCACT 

5 ACCAATTTACAGAAATAAAAAGGATTATGAGGAAATACTACAGATAATTGATGACA 

TTAACTTAGAAGAATATATTTCAAGAAAGACACAAACTACTGAAACCGACTCAAGA 

AGAAACAGAAAATCTGAACAGACCTATAAAAAATAGAGATTTAATTGATATTCAGA 
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AAGTTTCCCAAAAA.GAAAAGCACTGGCCAAGATGACTTCACTGGTGAATTCTATCA 
AGTGTCAAAGATGAATTACTGACATTCATTCACACTCCTTTAAGAAATAGAAGAGG 
GGACATCACTTTTCAAAGCATCGACATTCTAATCATTAGTCCCTTGGTTTCCTGCT 
CCCAAAGCCAGGTGATGTATCACAAAAAAACCCCT^ 
5 GCTTTATGCCTAT 

Sequence ID - 595 nt: 98 

CTTTGCTCGAATNGTCAGATAAGGATTCTGTGAANGGAGATGAGATTTCCATCCAT 

GCTGACTTTGANAATACATGTTCCCGAATTGGGGNCCCCAAA 

10 

Sequence ID 596 

CTCAAGTGTTCCCTCAGCTTAGGCTTTGTTTAAATGATCCCACCCAGGGGCGATGG 
TAGGGAACAACAGGGTGACTAAACTATTTGGCTGGCTACAACT 

AGACAGGGAAAGGC CATGTTGTTCATTCC CTTGTGCAGAT CTAGGGAGAACCGCAG 

15 AGAGAACAGTTAGCATTTCTTGTTCAATGAATTATCCTATTAAGAACACTGGATGT 

■ — . ■ ■* 

Sequence ID 597 

CGGNCGCGGTCGACGCTACTCCTACCTATCTCCCCTTTTATACTAATAATCTTATA 
AAAAAAAAAAAAANAAAAAAAAAAA 

20 

Sequence ID - 598 nt : 362 

GGCATGTGCCTGTAGTCCTAGTTGCTGAGGTAAGAGGATTGCTTGAGCCCAAGAGT 
TCAAGGCTGCAACAAGCTTTGATTGCGCCACTGCACTCCANCCTTGGCGACAGACT 
AAAACGCTGTCTCAAAAAAAAAACAAAAAC^ 
2 5 ATTAACTTAGGCAATGACAGTCCCTGGCAAATGCTGGGAGGGAGGCAACANTGGT 
AAGGAAGGTAACCCTGAANCAGGACTTGTAAAGCAAATAANATTGGGAGGCCAAGG 
TGGGTGGATCACNAGGTCAGGAGTTCGAGACCAACCTGGCCAACATAGTGAAACCC 
CGTCTTTCTAAAAATACAAAAAAATT 

30 Sequence ID 599 

GACAAAAGAACCATTTGGATACATAGGTATGGTCTGAGCTATGATATCAATTGGCT 
TCCTAGGGTTTATCGTGTGAGCACACCATATATTTACAGTAGGAATAGACGTAGAC 
ACACGAGCATATTTCACCTCCGCTACCATAATCATCGCTATCCCCACCGGCGTCAA 
AGTATTTAGCTGACTCGCCACACTCCACGGAAGCAATATGAAATGATCTGCTGCAG 

35 TGCTCTGAGCCCTAGGATTCATCTTTCTTTTCACCGTAGGTGGCCTGACTGGCATT 
GTATTAGCAAACTCATCACTAGACATCGTACTACACGACACGTACTACGTTGTAGC 
TCACTTCCACTATGTCCTATCAATAGGAGCTGTATTTGCCATCATAGGAGGCTTCA 
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TTCACTGATTTCCCCTATTCTCAGGCTACACCCTAGACCAAACCTACGCCAAAATC 
CATTTCACTATCATATTCATCGGCGTAAATCTAACTTTCTTCCCACAACACTTTCT 
CGGCCTGTCCGGAATGCCCCGACGTTACTCGGACTACCCCGATGCATACACCACAT 
GAAACATCCTATCATCTGGAG* 

Sequence ID - 600 nt : 595 

TTCAAATTCTTGNTAANAGTCTTTGTTCTGl^ATTTTACTTTGTCTGTTATTCCTAT 
AGCCTTTCCAATTTTCTTTCGCTTGGATTTTACGTGATAAGTTTTTTCCCCCATTT 
TACTTTTANCAACTCTATATTTTTTAGTTGAGGTTGGGTTTCTTGTAAACAGCATA 
TAATTTGGGTTTTTTAATCCAATCTGAAAATTAATGTCCTTAATTTTGTGTTTATA 
CCATTTACACATAATGTACTCATATATAAGGTTTAACTGAAACCTACTATCTTGCT 
AGTTGTGCTCTACTTGAATTTTTTTTTAGTATTCTGTTTTAATTGACCAACATTTG 
ACTGTATCTCTTTGTGTAATTCTTTTACAGGTTGCTGTAGGCATGACAATATATAC 
ACTTAACTTTTCTCAGTACACTGAGAGTTGAAATTGTAGTACTTCGAGGAAAACAT 
AGAAAACTTGCAATGATATCGGTTACATTTTACCACCTCCATATGTTGCAATTATT 
AAATGTATTAGATCTGCCTACCTCGAAAACCCATCAGTCTTT^AACTTTGCTCTCA 
ATGGTGATTCATATTTTTAAAAAAACTTGAGGCAA 

Sequence ID - 601 nt: 522 

0 TCGACCGGGTTTGGAGCAGTGCCTTGTTTGCTGTGCAGCGGATACTCTACAGGTAC 
ATTTCCTTTTTGGAACCAAAAGGGAGGGATTTGACAATATTGATGGTAGATCTTTT 
TTCTTTAGCAAGAATTAAGGATTTTGGTGGGTGGGGGGAGGCTTCTGTGGGGACCA 
AGACAATGTACTGTCAGTCAGGATTTAAGTCGAACTACCTCATCCCTTGCCCCAGA 
GAACAGTTGATCGTGTTTTAAACCAAAAGGTGCGGAATGGAGAGAGGGAGGCGGTG 
5 CATTGCAGCTTCCGATAGAGCTTTTTATTTTTGGATATCAGGAACCAATTTTGAAG 
ATTTCTTAAGAAAGTCATTTACATCAGGGACATGAAGAGCAAAGTAGGTATTTTTG 
GTCAGTACTTGAATTTGATAGGCTTTATGCAAACAACTCTCCCTCTGCTGGAGTCT 
GGCAAGTTTGCTTTTCACTGGACGCTAATTCAAGTGCCATACAAAACTAAAATAAN 
AGTTTTACTTATAACACA 

0 

Sequence ID 602 

CAGAAATCGCAATTGAAGACCAGATTTGTCAAGGTTTGAAACTGACATTTGATACT 
ACCTTCTCACCAAACACAGGAAAGAAA 

GGAGTGTATAAACCTTGGTTGTGATGTTGACTTTGATTTTGCTGGACCTGCAATCC 
ATGGTTCAGCTGTCTTTGGTTATGAGGGCTGGCTTGCTGGCTACCAGATGACCTTT 
GACAGTGCCAAATCAAAGCTGACAAGGAATAACTTTGCAGTGGGCTACAGGACTGG 
GGACTTCCAGCTACACACTAATGTCAATGATGGGACAGAATTTGGAGGATCAATTT 
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ATCAGAAAGTTTGTGAAGATCTTGACACTTCAGTAAACCTTGCTTGGACATCAGGT 
ACCAACTGCACTCGTTTTGGCATTGCAGCTAAATATCAGTTGGATCCCACTGCTTC 
CATTTCTGCAAAAGTCAACAACTCTAGCTTAATTGGAGTAGGCTATACTCAGACTC 
TGAGGCCTGGTGTGAAGCTTACACTCTCTGCTCTGGTAGATGGGAAGAGCATTAAT 
GCTGGAGGCCACAAGGTTGGGCTCG 



Sequence ID - 603 nt: 624 

gacacacgagcatatttcacctccgctaccataatcatcgctatccccaccggcgt 
caaagtatttagctgactcgccacactccacggaagcaatatgaaatgatctgctg 
cagtgctctgagccctaggattcatctttcttttcaccgtaggtggcctgactggc 
attgtattagcaaactcatcactagacatcgtactacacgacacgtactacgttgt 
agcccacttgcactatgtcctatcaataggagctgtatttgccatcataggaggct 
tcattcactgatttcccci'attctcaggctacaccctagaccaaacctacgccaaa 
atccatttcactatcatattcatcgg'cgtaaatctaactttcttcccacaacactt 
tctcggcctatccggaatgccccgacgttactcggactaccccgatgcatacacca 
catgaaacatcctatcatctgtaggctcattcatttctctaacagcagtaatatta 

ATAATTTTCATGATTTGAGAAGCCTTCGCTTCGAAGCGAAAAGTCCTAATAGTAGA 
AGAACCCTCCATAAACCTGGAGTGACTATATGGATGCCCCCCACCCTACCACACAT 

TCGAAGAA 

Sequence ID - 605 nt: 338 

ACCTGAGGCCTCGGTGGGGCCAGTGCGACGCTGGCTTAAGGAGCTGGAGGGGTTCC 

TAATACACATTTAATTCAGTTTCTCTTCCCTAAGAGGCTGCCGGAGTTGGGGCCTC 

CTCCAGCAGAGACCCTCGGACCCCTGCAGGGCCTGGACTTGGGGTGAACAGGGCTT 

CAGTCAGCGCAAGTATTCCATTTGCATTTGGTAATTTTTCATGCCACCTATTTATG 

AATATATAAATCTTTATACCAAATCTATTTTTTAAAACATGGAAAAGTTGCCTTTA 

TGGAAACTTGGCAGAGCCAGAGTGTACACATTCCTAAACCATTAAACAGATTTCTA 

TA 



0 Sequence ID - 606 nt : 556 

GGATAATGATACCTCTGACCTTTCTTCCTTTTGGGAAGTACTTGAGTGTGCAGCTG 
CATGAGGCCTCAGCAGGAGAGAGATTTTAGGTCCAAGAAGCTATACCAGTAGGACA 
AGGCAGGAAAATACTACACTTTCAGGATCAAGCCCCTCTGACTCTCATTTGGAAAC 
TGGATGTTTGCTAAGCACCTGCTTCTTAAGGATGCCGAGGGATTTAATGATACTCC 

5 CAGAAACCTGGAGAGATTAATGGGGCCTATGGAGAAGTGCTCTGAACTCAGTGTTG 
GGACTTGAATAAAATTAACCATTGTCATGTTTTCAGAACAACTAAGCTGTTTTATA 
TTTCATGTGCATGAAAGCCCTAGAACTAAGTTGTGTTATTTCCAGAAATGAAATAG 



WO 2004/046382 



PCT/GB2003/005102 



- 187 - 

ATCCCACAGTTAGATGATGTGGCCATTAGGAAGTACCAAATTTATAAAAATCACTG 
GAGGTCTGTCTGAGCAGTACCTAATAAAATATAGTATACTGAAAGTGAACAGATCT 
TTGTCTCTTTCTTTGGCTGCTTGATACTTTATCTGTGTCTGCCGGACAGTGC 

5 Sequence ID 607 

G^TAAAAGCAGGTTAACCTCAATGATAGCAGTTAAAATGTTCTATCTTATGTATT 
TCTTTTAAGTATTACCATTATGGTGCTACTGAGCGTTTTCTTTTGGTAAAAAGAAA 
AATGCCATGGGCTGCAGTCTTCTTCCATCACTTTTCCCTACCAGGTCCATTAATAT 
GCTTATAACACTAGTGCCAGTTATTTTATTTGATAATGCTTATGGTATTTGTATAT 
1 0 TTGTTTGCATTCCAATTTTGTTTAATAATGAGTGTGTAAACTGCATACGTTAAATA 
AATGTAAATACTAATGTACTGCTGC 

Sequence ID 609 

TTTTATTACCCAAGTTTTAACCTCTGTCTGGTGATTTGTTGTTGTTGTTGTTGTNG 
15 TTGTTGTTGAAGTTCAGGCTGCATGTGGGATAGGTTTGCTCAGGCATACTTCTTAG 
GAAGTAGTCACTTGCATGACTGTTTTTGGGATAACTCTTTGAGTATTTGGAGAGGT 
CTATTGTAACTTCTGAAAGGCATTGTTTTTACGTATGAATGTTCTAAAATTCATTC 
TAAATGGTCATGAAAAGAAAAGGATTCACATTTTAGAATGGCAATAGTCCCTGAGG 
ACTATTATGTCTTTTAGATTTCCTGTGGGTTTCTAGGAATGTTAGTGTAACTTANA 
20 TTTCCACCTACCTGATTTCTGGATGTGCCTATTGGAACTTGCTGAGATCTTTTTTT 
TTCCTTAACATGTTGTCCCCTTGACCCGTACTTCGAAACTAAACATATTATTTTAT 
TTGCTTACACTTCAGGAGGCAATTGGCAGACAC CAGGCCAACAGTCT 

Sequence ID 610 

25 GCTCTGACCCCAGTTGGAAATGTATCTGTACTTTGTCCGGCTTCCACT.CAAGGACC 
ATTTATGACATTGCTTGGTGTCAGCTGACAGGGGCTCTGGCCACAGCTTGTGGGGA 
TGACGCGATCCGCGTGTTTCAGGAGGATCCCAACTCGGATCCACAGCAGCCCACCT 
TCTCCCTGACAGCCCACTTGCATCAGGCCCATTCCCAGGATGTCAACTGTGTGGCC 
TGGAACCCCAAGGAGCCAGGGCTACTGGCCTCCTGCAGTGATGATGGGGAGGTGGC 

3 0 CTTCTGGAAGTATCAGCGGCCTGAAGGCCTCTGAGCTACCTCGACTTTGGACAGAG 
TAATGACTCCCCAGAAAACGTCATATAAGACTTTACCAGCCCCTGAGAGGACCAGG 
AGGAGCATCCTTGACCTTCATTTAACTTGGCTCACTTCTCTTCANACTTGGGTAGA 
AGTGCAGAGCCACAAAATTGCTTTCCTTCCCCGCCTTTGACATGAGGCCTTCAGTA 
AAG 

35 

Sequence ID 611 
TGCAGGATC CGT CGACT 
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Sequence ID - 612 nt : 576 

GAGAAATATAAGATTATGTATAGATCAAATCTACCTCTATTTGGTGTCCTGAAAGA 
GATGAGGAGAATGGGACAAACTTGGAAAGCTTATTTCAAGATAACATTCCTGAGAA 
CTTCCCCAATCTTGCTAGAGAGGCCAACATTAAAATTCAGTAAATGCTGAAAACTC 
5 CAGTAAGATATTTCTTAAGAAAATTATTCCCAAGATATATACTCATCAAATTATCT 
AAGGTCAAATGAAGGAAAAAATTTTATAGGC^^ 
TACAAAGAGAATGGCATAAGAGAAAAAGTAGAACT 
CAGAAGAGATTAGGGGCCAATATTTAACATC 
TCATATCCAGCCAAACTAAGCT^ 
1 0 CAAGCAAATGCTGATGAAATCCATCACCACCAGACCTGCCTTATAAGAGCTCCTGA 
GGGAAGCACTAAATATTGAAAGGGAAGAACTTTAT 
TAAGTNCACAAAGCAG 

Sequence ID - 613 nt: 341 

CCTTATTTTACAGGTGAAAAACCACGAATCAGATAGATTTTTATTTGCCCAAGTCA 
CATAATATTAAGAACAGGCCAAGTGTGGTGGCTCATGTCTGTAATCTGAGCACTTT 
GGGAGGCTAAGGCGGGTGGATTTCCTGAGCCTAGGAGTTTGAGATCAGCCTGGGCA 
ACATGGCGAAACCTCATCTCTACAAAACATACAAAAATTAGTCAGTGTGGTGGTGA 
GAGCCTGTAGTCCTGGCTACTCGTGAGGCTGAGGTGGGAGCATCACCTGAGCCTGG 
GAAGTCGAGGCTGCAGTGGCAACAGAATGGGTAACCTGGACATCAGAGTGAGACCC 
TGTCT 



15 



20 



Sequence ID 614 

CTCACACCTGTAATTCCATTACTTTGGAAGGCTGAGAGAGGAGGATCAGTGGAGCC 

2 5 CAGGAGTTTGAGACCAGCCTGGGCAATATAGGGAGACCCTGTCTCTACAAAAATGA 

AATAGCCAGGCGAGGTGGCATGTGCCTGTGGTCCCAGCTACTTGGGAGACTGAGGT 
GGAAGGCTGCCTTGAGCCCAGGAGTTCCAGGCTGCAGTGAGCCATCATTATGCCAC 
TGCACTCCAACCTGGGAGACAGAGTGAGAGAGACCCTGTCTCAAACAAACAAACCC 
AAAATAGGCGAGGCACAGTGACTCATGCCTGTAATCCCAGCACTTTGGGAGGCTGA 

3 0 AATAGGCGGATCATTTGAGGTCAGGAGTTCAAATTCAAGACCAGCCCGGCCAACAT 

GGCAAAACCACATCTCTACTACAZ^ATAAAAAATTAGTTGGGTGTGGNGGAGCATTC 
CTGTAATCACAGCTATTCAGGAGGCTGAGGCATGANAACCGCTTCA 

Sequence ID - 615 nt : 379 

3 5 TAAATTTAAAACATTTTAATTAGCTGGCATGATGGCATGCACCTGTAGTCCTACCT 
ACTTGGGAGGCCAAGGCAGGAAGATTGCTTGAGCCCAGGAGTTTGAGCTTACTGTG 
AGCTGTGATC^CACCACTGCACTCCAGCCTGGGTGACAAAGGAAGACCGTATTTCT 
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AAAAAATAAAAAATACAAATACAACTAC^ 

GTACCATGAACTGAGGAATATTATTAATTCCACCATTTGCATCTGAGGTTAACAAT 
ATGTCAATGACTTAAATAACATCATATCTCTGAGAGTAATTTCTCCTATATTTCCA 
* TGACAAATGTTAGATAATTTTCCATTTTTTCCATTCAACAAAA 

5 

Sequence ID 617 

TTTTCAGGC^TGTCAGAGAAGGGAGGACTC^ 

TGACATCCTCCTTC^GGAACACGGGGAGCAGAGGCCAAAGC^CTAAGGGGAGGGCG 
CATACCCGAGACGATTGTATGAAGAAAATATGGAGGAACTGTTACATGTTCGGTAC 
1 0 TAAGTCATTTTCAGGGGATTGAAAGACTATTGCTGGATTTCATGATGCTGACTGGC 
GTTAGCTGATTAACCCATGTAAATAGGCACTTAAATAGAAGCAGGAAAGGGAGACA 
AAGACTGGCTTCTGGACTTCCTCCCTGATCCCCACTCTTACTCATCACCTGCAGTG 
GCCAGAATTAGGGACTCAGAATCAAACCAGTGTAA 
CTGGTCACATTGAAATTGGTGGCTTCATT 

15 

Sequence ID - 618 nt : 598 

GATTAACTTTCATTTTAAGCTCTTCTCTACTAATTCTGTTCGTATGTTTATTCATT 
TTGCGTTGATCATATTTTGTACACCAGGCACTCTTCTCAGTTTTATATGTGTGTTA 
ATTTACTCCTTTCAAGAGCCCTATGATAGATGAATTTATCTCCATTTTATAGATGA 
20 GGAAATTAAGACCTAGAGTTACTGAACTTGCCCAAGGTTATACAGCTGATGGGTAG 
GGCCAGAACTTTGCCTCAGAGAATCTGAATTTCCAAAAAATAACCTAAAAGAGAAA 
TTTAAGTACTAATTAGTAAGCAAAGAAATGCACA^ 

TAAGGAAGACAGTAACCTTTTATCTATTAGAGAAAAACACACATTCTGTCTTTAAC 
ACACACATAAATCTTATATTGGCAGGGATTTTCTTTATTCAGCAATTATTTATTGG 

2 5 TTGTCTGCTTTGTGGTACACATAAATGCTGGGGATAAACACTTAATAAAATATACT 

TCCTTCTCTTGAATATCTTGCACTTTAAGTGGGAAGGTAAGTCAACAGAGTAGAGG 
TGATATATCCAAGTGATAGACTGTTTCATTGCCAGTAG 

Sequence ID 619 

3 0 GTTGCCTGAGAGTGACCTTTGCATCTGCCTGTCCAGCCAGCATGGAACCAAAGCGG 

ATCAGAGAGGGCTACCTTGTGAAGAAGGGGAGCGTGTTCAATACGTGGAAACCCAT 
GTGGGTTGTATTGTTAGAAGATGGAATTGAATTCTATAAGAAGAAAAGTGACAACA 
GCCCCAAAGGAATGATCCCGCTGAAAGGGAGCACTCTGACTAGCCCTTGTCAAGAC 
TTTGGCAAAAGGATGTTTGTGTTTAAGATCACTATGACCAAACAGCAGGACCACTT 
3 5 CTTCCAGGCAGCCTTCCTGGAGGAGAGAGATGCCTGGGTTCGGGATATCAATAAGG 
CCATTAAATGCATTGAAGGAGGCCAGAl^TTTGCCAGGAAATCTACCAGGAGGTCC 
ATTCGACTGCCAGAAACCATTGACTTAGGTGCCTTATATTTGTCCATGAAAGACAC 
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TGAAAAAGGAATAAAAGAACTGAAT 
Sequence ID 621 

TGGTACTGAACCTACGAGTACACCGACTACGGCGGACTAATCTTCAACTCCTACAT 
5 ACTTCCCCCATTATTCCTAGAACCAGGCGACCTGCGACTCCTTGACGTTGACAATC 
GAGTAGTACTCCCGATTGAAGCCCCCATTCGTATAATAATTACATCACAAGACGTC 
TTGCACTCATGAGCTGTCCCCACATTAGGCTTAAAAACAGATGCAATTCCCGGACG 
TCTAAACCAAACCACTTTCACCGCTACACGACCGGGGGTATACTACGGTCAATGCT 
CTGAAATCTGTGGAGCAAACCACAGTTTCATGCCCATCGTCCTAGAATTAATTCCC 
1 0 CTAAAAATCTTTGAAATAGGGCCCGTATTTACCCTATAGCACCCCCTCTACCCCCT 

Sequence ID 622 

TTTTTCTTGTTTTTGTGTGTCTACCTTGGCATATACTAAAGGAAGGTGTGTATTCA 
TTTATTACATGATATCTCTGGGTTATAATTATTTACATATATGAATTTGAAAGAAA 
1 5 GATTGAGAGGGATATGTGTGACCTTTGTTTCATTATGATCATTTACATGACTAAAG 
ATAAAGATCATATGTCTGATTTTCAGTTTAATGGCAAGTTACTTAAAATAAATGAA 
ATATGTTTTTATTGTTTTCGTGGGTTTGATGCTTTGTGTTTTATTTCAAGTAACTT 
GAGAATGCATTGTGTTTGGTACTGTTTTTTATGAATATCATTAAAAATTTATTTAA 
GGAGAGAGTAATTTTGCAATAATATTTTTGATTTATTTGAAAATAAAATTCAAGAT 

2 0 AAATGAAATAATTGAAATTTTCTAAAGAAGGAATTGAATATATTTTTACATTTGAA 

TGAACTAAGGATTAACTGAACCATTTATATATAGTACTTTCAGAACTGAATGTCTT 
AAATGATAAAGCTCTAATTGGTTAAAGTGACTTTCTTTCAAGTCAAAGAACCCAGA 
AACTGAATAGATGATCTAACTACTGCCACTGAGGTTTTGGATTAGTGAGTATAAAT 
TT 

25 

Sequence ID 624 
TGCAGGATCCGTCGACT 

Sequence ID 625 

3 0 GACAATCAGAGCAGATCTTGGGCTTCTGTGGCTCATCTCAGCCCTTTATAACTGGC 

CTGAGAAGAGGGTTTATCTACTTGTGCAAGTGGCCCAGAAATCTCACTCGTACATG 
AGGCTTTGGAACATCCTTGCAAAGGTACGCTGAAAGCAAATTGCTGTTTTCCTGGT 
GGTTCTGCACGTTTCCTAACTTTTATCATAGTTTGATTTTCATTATTTAAGAAAAA 
ATAAAAAATCCAAAGACCATAAGATGGCATTAGATTTTTTACC3ATTAAATTATTAA 
3 5 TGCCTATTTGGTGCTCATAAAGATTAATCATGTCACGCATGTTTCCAATCTTTCTT 
TTGCAGTATATTATTTTCTAAAAATTGTTACATGCAAATTTAAACCAAGATTTATC 
AGTA 
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Sequence ID 626 

ttggaagaaataaacg^ggcagaaaaattttaaatggccaaaa.taaattgtattg 
ctaacttagatggccagagatgggggcaggggtggagagaggagaaattgaaaa.cn 
ccacaaagaccccgc^tggctagaActtga^^ 
5 agcctccttaagtcago^aaagataaagattgatccaatgttctatattacag^ 
(^gagcagattgtcaatatagcaaataaagttaccgttgagtggactgcgctgtnt 
aagctgcttggttggccttaagtgccgacaattaagagatgaaggc^atgagaact 
gaaac^aacatttaagttc^gaccc^gttt^^ 

ctctttgggcctcagtttacttatctgtaacattaagaggttggattacatgatgt 
agtgcagtggcatgatcatagctcacagcag 

Sequence ID 627 

ccagcctgtcactggcctggccaaggaggagagacaggccagggattctggtccta 
1 5 actctactggcc^cactgtgtggcctgagacccccctttccctccc^agcccctgc 
ctccgcatctgcgtggtgaaggccattggccctcatcggtggatctgcgtttcctc 
gggcctacactgtctaggattgtgcggggctggtgagagaacaagatctcttccgt 
gttcaaggcagacttcctgccccctgcaccctgctctctcccaggccttgaggtca 
gtgtgagccccaagggcaagaacacttctggaagggagagtggatttggctgggcc 
2 o atctggatggaaggtaaaaaaaagaaaatcccttgaaaggagattgagggaagttt 

Sequence ID - 628 nt: 419 

aagagaaaggactcagtgtgtgatccggtttctttttgctcgcccctgttttttgt 

AGAATCTCTT(^TGCTTGA(^TACCTACCAGTATTATTCCCGACGACACATATACA 

2 5 TATGAGAATATACCTTATTTATTTTTGTGTAGGTGTCTGCCTTCACAAATGTCATT 

GTCTACTCCTAGAAGAACCAAATACCTCAATTTTTGTTTTTGAGTACTGTACTATC 
CTGTAAATATATCTTAAGCAGGTTTGTTTTCAGCACTGATGGAAAATACCAGTGTT 
GGGTTTTTTTTTAGTTGCCAACAGTTGTATGTTTGCTGATTATTTATGACCTGAAA 
TAATATATTTCTTCTTCTAAGAAGACATTTTGTTACATAAGGATGACTTTTTTATA 

3 0 CAATGGGAATAAATTATGGCATTTTTT 

Sequence ID 629 

CTGAGAGTCACTGTGTTTTTAGCCAAATCTAAGGGAGAAAATGAATATTGATAGCA 
GCATGCTGTAGCCAGCTCCTTAAAGGAAGGATGGTGCCTGGTACAGAGTTAGAGTT . 
35 AGTGCTTCAGTAAATAATGAATGTGTGCTAGGTAGGTTCTGCTGGGTAGGCTGCAT 
GCATTGACCAATTTATTCCTCCTTGTTTCAAAACAGGATTTAAGGGCACTTATATA 
TATATATTTTTTAGTTTTTTTAA.TGTAAATGAGAGAATAAAGATATATATATATGT 
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CTATATATGTATATATGTATATATATGTCTATATGTCTATATGTATATATGTCTAT 
ATGTATATATGTGTGTGTGTATATATATATATATATATATAAGTTTTCTGTTGCTA 
GGATAACAAACTACCAGAA^CTTAGCAACTGAAACAACATGAATTTATCTTACGGT 
^CTATAGTTCAGAAGTCTAACGTGTCACTGGGATGAAATCCAGGTTTGAACAGGAC 
5 TGGGTTCCCTTCTAGCTCATTCAGCTACCTGGCTCATTCAGGTTGTNGGCAGAATA 
TACTTCCATGAAACTGTAGGGCTGAGACCCCGTTCCTTCCTGGCTATCATCTGAAA 
ACTTTC ^ 

Sequence ID 630 

AGGCGCAGCCCAGCCTCGAAATGCAGAACGACGCCGGCGAGTTCGTGGACCTGTAC 
GTGCCGCGGAAATGCTCCGCTAGCAATCGCATC^ 

CATCCAGATGAACGTGGCCGAGGTTGACAAGGTGACAGGCAGGTTTAATGGCCAGT 
TTAAAACTTATGCTATCTGCGGGGCCATTCGTAGGATGGGTGAGTCAGATGATTCC 
ATTCTCCGATTGGCCAAGGpCGATGGCATCGTCTCAAAGAACTTTTGACTGGAGAG 
AATCACAGATGTGGAATATTTGTCATAAATAAATAATGAAAACCTAAAAAAAAAAA 
AAAAAAAAAAAAAA 



Sequence ID 631 

TNCACTCACAGACTCCCAAACCTTAAC^ 

0 GGCCAGCCTCTTTTATGCTCCTCACATGTTTCCTTTAACTGGAATACCCATGACAG 
CTCCCTACATAGTTACTTGTAAACTCCTCCTCTCTGTATAAGTTTTCCTGAATTTT 
TTTGATAAAATTAAGTTGTGCCACCCCTTTATGCTCTCTTANAACTTTGTTCTGTT 
CTCATGGCTGTTCTGCAACGAATCTCATTGTGTTCTCCTACTCAATTACATTCCTG 
CGTCTCCCACTAGATGGCAGACTCTTTGAGAGTAGGAGATTCCCTTGTTATCTCTG 

5 GATCCCTGGCACTTGCAGAAAGCCTGTTACGTAATAATTGCTCAACAATTAGTTTT 
TAAATAAATGAATTATTTTTAAAACGCCAAAATTACAATGATTGTGCATTAAGTGA 
AAGATGACCATCTAAAAACATAAAGCCATGCTTCATGACATTGGC 

Sequence ID 632 

0 GACCATTCAGGGAAATTTTATAAAAAATGCAGATACTGTCTTGAGCAGATCGAAAT 
GCCGATGAGGTGGATGCAATTTCCTTTTGTGCAAGCAGTGCACGGTGCCCCCCCCT 
CGGGTGTCCGTGCTGTGCCTTAGCTTCCCCAGGTGCCGGGACTCACACCTGCTAGG 
GGCTGGGCAAGGCCCCGGCTCTGCTTTCTCTGAAGGGCTTGTCCAAGTTCATTGCC 
CTGTTACAGGTGGTCAAGACGTCCGGCCGCCTTGACCCAGGCTACCCTTAGCCAAT 

5 ATCCTCTGCCCCTGGGTGGTTGGTGGCTGGGCCTCAGGGTGGGCAACGTTAGGGGT 
TTGGCGAAAGCCCGCCCCATGGGATTGAGGGACGGGGCTGCACTCCAACCGTCTGC 
ACCTGCTCTTCCCCCACCCCTGTGGGACCTCATCTTCACGTGCCATGTGTGCTGAA 
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GGCCCAGGGCCCAGCAGGGGGCAGTGGCACCTGTTGACGGAAAAGCCGAGGTGCTT 
ACCAATGGACCTTCTGGCCCGCCCTCCCCTGTACTTGTCGGGCATTCAGGGCCCCG 

ACCTGTGCCTACCCGCA 

» 

Sequence ID 633 

CAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCACCTGACTCCTGAGG 
AGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGT 
GAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTC 
CTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTC 
ATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTC 
AAGGGGACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCC 
TGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTGTGTGTGCTGGCGCATCACTTTG 
GCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCANAAAGTGGTGGCTGGTGTG 
GGCTAATGCCTGGCCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTC 
TATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAA 

GGGCCTTG 

Sequence ID - 634 * nt : 511 

TTTTTTAATTTCACCAAAATTTGTTGACGTCCCTTGATTTGCTGATAGGGACAATA 
ATTAAATATTTTCCACTTGTTTTTATAAAAACTGTAATGGTGATTTGTTTAACAGA 
TGTTGACTTAGCACCTTCTCTCTTTTTTTTTTTTTTTTTTTGAGTTGGAGTCTTGC 
TCTGTCACCCAGCTGGAGTGCAGTGGCACGATTTCGGCTCACTGCAACCTCCGCCT 
CCCAGGTTCGGGCGCTTCTCCTGCCTCAGCCTCCCANATAGTTGGGATTACAGGTG 

TTGTTGCCCATGCCGCTCTTGAACTCCTTGGCCTCCCAAAGTGTTAGGATTACAGG 
CGTGAGCCACTGTGCCTGGCCCCAATTTANCACCTTACTGGGTGCTGAGGCTGTGA 
GCCATAGTAGAATGCATGTGATCCAGGGCCTTGCTGI^ATTCATGGGCTAATAGGGA 

GCCTGAC 

0 Sequence ID - 635 nt: 592 

TGAGCGTTGGGCTGTAGGTCGCTGTGCTGTGTGATCCCCCAGAGCCATGCCCGAGA 
TAGTGGATACCTGTTCGTTGGCCTCTCCGGCTTCCGTCTGCCGGACCAAGCACCTG 
CACCTGCGCTGCAGCGTCGACTTTACTCGCCGGACGCTGACCGGGACTGCTGCTCT 
CACGGTCCAGTCTCAGGAGGACAATCTGCGCAGCCTGGTTTTGGATACAAAGGACC 
TTACAATAGAAAAAGTAGTGATCAATGGACAAGAAGTCAAATATGCTCTTGGAGAA 
AGACAAAGTTACAAGGGATCGCCAATGGAAATCTCTCTTCCTATCGCTTTGAGCAA 
AAATCAAGAAATTGTTATAGAAATTTCTTTTGAGACCTCTCCAAAATCTTCTGCTC 
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TCCAGTGGCTGA.CTCCTGA^ 

CAGTGCCAGGCCATCCACTGCAGAGCAATCCTTCCTTGTCAGGACACTCCTTCTGN 
GAAATTAACCTATACTGCAGAGGTGTCTGTCCCTAAAGAACTGGTGGCACTTATGA 
GTGCTATTCGTGATGGAGAAACACCTGACCCA 

5 

Sequence ID - 63 6 nt: 572 

CTTANAAGAGTTGCTC^TTCAC^CCCACGCCCTTGCCC^AGGCTGGCCCACTCAGA 
GCGAAACTTAACTTTTGTCTGGATGGGAAGAGAAGTAAGTCTACCCCGAGGTTGCC 
ATGTTGAAGAGTGAGAGGTCCAAGTGATTCTGTGCATTGAAACCAAGACACCCCAC 

10 CCAGAACACTTCTTCCCTCCCTCAGCCCAAACCAAAGGCTGGGGTTCTCA.TCTCCA 
AGTGGCTGTTCTCCAACTTTCCCAAGCCGCTTGCATTCCCCAGACTGGACTACTGT 
GGCGGTTAGGTTAGATTTGAAGACGGGGGCCAGGCTGGGTATGAACGGGTGCAGCC 
CTCTTCTCCTCTTCCCCCCCACATCTCTCATGAGAGAGGTAGTGGCATTTCCTTCT 
CAGGGAGCTTC^^TGGGAAAGGTCTCGAAAGCTTCAGGAGGAGCAGAATACCAACG 

15 CAGGGGGATGGCTGTAACGATCTCACCGTCTCCTAACCTCAGTCCCTTTTTTGAGA 
GTGAATGGTGGAGGGTGGGAAAGGGACCCAAATTTGTAGATCTCTTTGTCTGGGGG 
AGGGGAANGATG 

Sequence ID - 637 nt : 482 

2 0 TTAAAACAGGCGCAGGGGTAAAAATGAGAATGAATCTGAAAAAAGAGAGTTGGTGT 
TTAAAGAGGATGGACAAGAGTATGCTCAGGTAATCAAAATGTTGGGAAATGGACGA 
TTGGAAGCATTGTGTTTTGATGGTGTAAAGAGGTTATGCCATATCAGAGGGAAATT 
GAGAAAAAAGGTTTGGATAAATACATCAGACATTATATTGGTTGGTCTACGGGACT 
ATCAGGATAACAAAGCTGATGTAATTTTAAAGTACAATGCAGATGAAGCTAGAAGC 

2 5 CTGAAGGCATATGGCGAGCTTCCAGAACATGCTAAAATCAATGAAACAGACACATT 

TGGTCCTGGAGATGATGATGAAATCCAGTTTGACGATATTGGAGATGATGATGAAG 
ACATTGATGATATCTAAATTGAACCAAGTGTTTTTACATGACAAGTTCTCTGAGGA 
TGGTTCTACAGTTGGGATTTTGGCCATCATCAAC 

30 Sequence ID - 638 nt : 545 

TTTGAAGGCAAAGAGGGATTAATCTGTGCTGGCATCATGTAAGGAGACTTGATAGA 
TAAGAAAAAGCTTTACCTAAGTTTTGAAGAATAGGTTTTTCATAATGGAAAATTTA 
AGGGAAAAATCTCCAAAAAAGTGCTACTCAAGTTTTATCCATTTGTATTTCCAACA 
CAGCCTAGGACAGTACCTGCACATAGTAGGTGATTAATAAAAATTTAGAAAGCATT 

3 5 AATACTAAAGAGGAAAAATAGCAATGGCAAGAAAACACATGTAGGGAACACATGTA 

GCCAAAAAATAATATATAATCAGAGAAATAATAGGACTTCTGGAAAAAAAAGATGA 
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AGATAATAAGTATGAAAGAATTTTAGCTTAAAAATTAGCATAATTTGGATCCACAT 

ATGCAAATCAATGAATGTAATTCATAATAT 

TGATTATCTCAATAGACACAGAAAAGGCCTTCAAAAAAATT 
# 

5 Sequence ID - 639 nt : 624 

GACACACGAGCATATTTCACCTCCGCTACCATAATCATCGCTATCCCCACCGGCGT 
CAAAGTATTTAGCTGACTCGCCACACTCCACGGAAGCAATATGAAATGATCTGCTG 
CAGTGCTCTGAGCCCTAGGATTCATCTTTCTTTTCACCGTAGGTGGCCTGACTGGC 
ATTGTATTAGCAAACTCATCACTAGACATCGTACTACACGACACGTACTACGTTGT 

1 0 AGCCCACTTCCACTATGTCCTATCAATAGGAGCTGTATTTGCCATCATAGGAGGCT 
TCATTCACTGATTTCCCCTATTCTCAGGCTACACCCTAGACCAAACCTACGCCAAA 
^ATCCATTTCACTATCATATTCATCGGCGTAAATCTAACTTTCTTCCCACAACACTT 
TCTCGGCCTATCCGGAATGCCCCGACGTTACTCGGACTACCCCGATGCATACACCA 
(^TGAAACATCCTATCATCTGTAGGCTC^^ 

15 ATAATTTTCATGATTTGAGAAGCCTTCGCTTCGAAGCGAAAAGTCCTAATAGTAGA 
AGAACCCTCCATAAACCTGGAGTGACTATATGGATGCCCCCCACCCTACCACACAT 
TCGAAGAA 

Sequence ID 641 

2 0 CAAGATGACAAAGAAAAGAAGGAACAATGGTCGTGCCAAAAAGGGCCGCGGCCACG 
TGCAGCCTATTCGCTGCACTAACTGTGCCCGATGCGTGCCCAAGGACAAGGCCATT 
AAGAAATTCGTCATTCGAAACATAGTGGAGGCCGCAGCAGTGAGGGACATTTCTGA 
AGCGAGCGTCTTCGATGCCTATGTGCTTCCCAAGCTGTATGTGAAGCTACATTACT 
GTGTGAGTTGTGGAATTCACAGCAAAGTAGTCAGGAATCGATCTCGTGAAGCCCGC 
5 AAGGACCGAACACCCCCACCCCGATTTAGACCTGCGGGTGCTGCCCCACGTCCCCC 
ACCAAAGCCCATGTAAGGAGCTGAGTTCTTAAAGACTGAAGACAGGCTATTCTCTG 
GAGAAAAATAAAATGGAAATTGTACTTAA 

Sequence ID 642 

0 TGCTTGGCCCTCTACCTCCTGCCCTCTTCCTGTTCATCTCCCAACCACTGCACTCT 
TGATTTTTATACCACACAGAAGGTAAGAAAATTCTAGGAACCCTAAGGATCAATCC 
TCTCCATTTTCACTCAAATGCCTGGGGCCCAGCTCTGCAATGACTGACTCCAGGGC 
CTCTTTCCTCACTGCCAGCATAGAAGTCAGGGGAGCCAGCTGGGCCCTGCGGTCAG 
GAAGGTTCTCATTTTTGGAGCATTCCCTGAGCCCAGATCATAGGAGCAGCTGTCCC 
5 TGGTGGGACACAGGAGTCATGACTCCTACCCTCCACCCTCCACACCCACCAGGCAT 
TTAGCAGTCTGTCCTATGCAAGACAGATGAATTCTCAGCCAGGATACCTCAAGGCA 
GGCAAAGGTGAGTGGAGGGAAAATTCAC2^AACATTCAGGGTGTGTGGTGCTGGCAT 
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CACCATGGCCAAATCCAAGAGGTCTTCCTGGAAGAGGGCCCAAACTGGAACCAAAA 
GAATGCTGTCAGCAGTTGGAATAGAGCTGTGAATT 

Sequence ID 643 

5 

CTTTCCAAGAGGAATCCTCGGCAGATAAACTGGACTGTCCTCTACAGAAGGAAGCA 
CAAAAAGGGACAGTCGGAAGAAATTCAAAAGAAAAGAACCCGCCGAGCIAGTCAAAT 
TCCAGAGGGCCATTACTGGTGCATCTCTTGCTGATATAATGGCCAAGAGGAATCAG - 
AAACCTGAAGTTAGAAAGGCTCAACGAGAACAAGCTATCAGGGCTGCTAAGGAAGC 
1 0 AAAAAAGGCTAAGCAAGCATCTAAAAAGACTGCAATGGCTGCTGCTAAGGCACCTA 
CAAAGGCAGCACCTAAGCAAAAGATTGTGAAGCCTGTGAAAGTTTCAGCTCCCCGA 
GTTGGTGGAAAACGCTAAACTGGCAGATTAGATTTTTAAATAAAGATTGGATTATA 
ACTCT 

15 Sequence ID 644 

CTTTGATAGAGAAGAAAATTCTCCTAGGATACAAGAGCCTCAACATTTTAAAGATT 
TTCTGCATCTCAAAAGCGTAGGCTCCTTGCTGGGCAAGGTGAGCCTCTGTGAGTCC 
TCATAGGACCGAGCAAATCTGATTCACCCCAGAAAATCCAATATCGAAGCTGAGCT 
TTGGCCTGAGCGGGTTCCATTTCCTCCCCAGATCCTATTTAGGAAGTGTCTCCTGA 

2 0 CAACCTCCAAAAGGTGCTAACATGCAACGTTCTGAAGGGTTATTGCTCAAAAACAA 
GATTTTCCTTGTGGTCAAGACTCTGCGAGCCTCGAACACGATGAATCCGCTCGAAT 
GGGCTTGGGCTTTGCCCGGGTGGCGCACGCTCACACGCTGGAAGCACAGCTTTGAC 
GATCTCCACACACGCACAGGCACACACGCCACAGATGATGCCGGCTCATTCTCAGG 
GGGTGTCTAAGTTCTGCTTTAAATATTTACCCCCTAATTGTACAAACAATAGGGGC 

25 ATGAGCCTGGTACTCGATAAATGGGGACTTNCTTAAAA 



Sequence ID - 645 nt: 649 

CTACAGCCTGGGCAGCGCGCTGCGCCCCAGCACCAGCCGCAGCCTCTACGCCTCGT 

CCCCGGGCGGCGTGTATGCCACGCGCTCCTCTGCCGTGCGCCTGCGGAGCAGCGTG 

3 0 CCCGGGGTGCGGCTCCTGCAGGACTCGGTGGACTTCTCGCTGGCCGACGCCATCAA 
CACCGAGTTCAAGAACACCCGCACCAACGAGAAGGTGGAGCTGCAGGAGCTGAATG 
ACCGCTTCGCCAACTACATCGACAAGGTGCGCTTCCTGGAGCAGCAGAATAAGATC 
CTGCTGGCCGAGCTCGAGCAGCTCAAGGGCCAAGGCAAGTCGCGCCTGGGGGACCT 
CTACGAGGAGGAGATGCGGGAGCTGCGCCGGCAGGTGGACCAGCTAACCAACGACA 

3 5 AAGCCCGCGTCGAGGTGGAGCGCGACAACCTGGCCGAGGACATCATGCGCCTCCGG 
GAGAAATTGCAGGAGGAGATGCTTCAGAGAGAGGAAGCCGAAAACACCCTGCAATC 
TTTCAGACAGGAAATCCAGGAGCTGCAGGCTCAGATTCAGGAACAGCATGTCCAAA 
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TCGATGTGGATGTTTCCAAGCCTGACCTCACGGCTGCCTTGCGTGACGTACGTANC 
AATATGAAAGTGTGGCTGCCAAAAACCTTGCAG 

Sequence ID - 646 nt: 600 

GAGATGTCTCGCTCCGTGGCCTTAGCTGTGCTCGCGCTACTCTCTCTTTCTGGCCT 
GGAGGCTATCCAGCGTACTCCAAAGATTCAGGTTTACTCACGTCATCCAGCAGAGA 
ATGGAAAGTCAAATTTCCTGAATTGCTATGTGTCTGGGTTTCATCCATCCGACATT 
GAAGTTGACTTACTGAAGAATGGAGAGAGAATTGAAAAAGTGGAGCATTCAGACTT 
GTCTTTCAGCAAGGACTGGTCTTTCTATCTCTTGTACTACACTGAATTCACCCCCA 
CTGAAAAAGATGAGTATGCCTGCCGTGTGAACCATGTGACTTTGTCACAGCCCAAG 
ATAGTTAAGTGGGATCGAGACATGTAAGCAGCATCATGGAGGTTTGAAGATGCCGC 
ATTTGGATTGGATGAATTCCAAATTCTGCTTGCTTGCTTTTTAATATTGATATGCT 
TATACACTTACACTTTATGCACAAAATGTAGGGTTATAATAATGTTAACATGGACA 
TGATCTTCTTTATAATTCTACTTTGAGTGCTGTCTCCATGTTTGATGTATCTGAGC 
AGGGTGCTCCACAGGTAGCTCTAGGAGGGCTGGCAACTTA 

Sequence ID 647 

CGAATGTGCAGGTTTGTTACATAGGTATATATATGCCATGATGGAAATATTTATTT 
TTTTAAGCGTAATTTTGCCAAATAATAAAAACAGAAGGAAATTGAGATTAGAGGGA 
2 0 GGTGTTTAAAGAGAGGTTATAGAGTAGAAGATTTGATGCTGGAGAGGTTAAGGTGC 
AATAAGAATTTAGGGAGAAATGTTGTTCATTATTGGAGGGTAAATGATGTGGTGCC 
TGAGGTCTGTACGTTACCTCTTAACAATTTCTGTCCTTCAGATGGAAACTCTTTAA 
CTTCTCGTAAAAGTCATATACCTATATAATAAA.GCTACTGATTTCCAAAAA . 

25 Sequence ID 648 

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 

Sequence ID - 649 nt : 425 

CAAAAAAACGAAGAAAAGTGACGACAGTCTGAGGGACTTATGGGAGATCATCAAGT 
GAACCACTATATGTGTAATGTAAGTCTTGGAATGAGAAGAGAGAAGGAGAAGGAGG 
AGAGAGCTTATTTGTAGAAATAATGGCTGAAAACATCCCAAACTTTCCTTTTTTTG 
AGGAAAGAAATAGGCATACAAGTTCAAGAAACTCAAGGAACTCCAGAGAGGACAAT 
TCTAAAGACACCCCCTCTAACATACATTATAATCAAATTGTCAAAAGTAAAATACA 
AAGAGAATCTTTTAAATTGACAAGAGAAAAGCAGCTGGTCACGTTCAAGGGAGTTC 
TATAAGAATTTCAGCAGATTTCTCAGCAGAAACCTTGCAGGCCAACAGGCAGTGGG 
ATGATACATTCAAAGTGCAAAAAAAAAAAAAAA 
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Sequence ID 650 

CGAGAGTTTACCAGTNGCCTAATAATGCAATAAAAAATGCTTTGAGATAGCTAACN 
GCCCATAAi^CA2\ACTCAAATTGCTTATAAAGTTTCTTCCCATGTTCCCATTTGAT 
■ GAAAAGTCTTACATCACATATAACTGGGAAGCAGGGGTCCCTCCTCAATTTTCAGA 
5 CATTTTGAAAGGATGACAGTTCTGTTTGTTAGATGAGTAAACCTCTATATTCATAA 
GTTCTAAAATCCTTCATTATGAGGGATTCAAAGTATTTATAAAAACACTGCCCTCT 
AAAAATTTCCTCAGATCTGAAGTATGGNCTTGGNCCTGAATATACAGTGTTATCCT 
ATGTTTAAAAGGGTGATCCAGACATGAGACGCAACTAGTTGGTGCATAAGAAGGCC 
CCACTTGGCTATTTCATATCTACCTACAATTGACC^AAAAAATTTTTTAGGC(^^ 
1 0 CAATTATTATTTAGCTTCGCTCTTTCTAGTGCAAGAAACTGCAGGCTGGATCAGTA 
GTTCAACkGCTAAACAGTC^TAAAATAGT^ 

CTTCAAAGATAAATTCCAATTCTATTTACTTATTCATTGNGACNGNATTACTAAAC 
AGGTAAGGATGGGAATA 

15 Sequence ID - 651 nt: 251 

CTTTGGGAGGCCGAGGCGGGCGGATCACTTGAGGTCAGGGGTTCGAGACCAGTCTG 
GCCAACATGGTGAAACCCCAACTCTACTAAAAATACAAAAGTTAGCCAAGTGTGGT 
GGCAAGTGCCTGTAATCCCAGCTACTCGGGAGGCTGAGACAGGAGAATCACTTTGA 
ACCTGGGAGGCGGAGGTTGCAGTGAGCCAAGATCGTGCCACTGCACTTCAGCCTGG 

2 0 GCAACAGAGCAAGATTCCGTCCATCTC 

Sequence ID 652 

CTTTCTTCAGCCTTGCAGACACCTAAACATCATGTAATTACCTAAGGAATTCCCAA 
GTGCCTCTTCCAGGTTATACGTGTAAATAGCTGTTTTTATGCAAGATTAGTTAGAT 
25 ACTGCTCTTTACAGGATGAGTGGTGTTGTCTTTGGCTGGGGGGGNCTTAAATGTGT 
TTCTAATGTGTGTGTCAAATAATTACCTGTTAAACAGACTGCCAATCTGGCTGAAG 
CCAATGCTTCTGAAGAAGATAAAATTAAAGCAATGATGTCGCAATCTGGCCATGAA 
TACGACCCAATCAATTACATGAAGAAACCTCTAGGTCCACCACCTCCATCTTACAC 
GTGTTTCCGTTGTGGTAAACCTGGACATTATATTAAGAATTGCCCAACAAATGGGG 

3 0 ATAAAAACTTTGAATCTGGTCCTAGGATTAAAAAGAGCACTGGAATTCCCAGAAGT 

TTCATGATGGAAGTGAAAGATCCTAATATGAAAGGTGCAATGCTTACCAACACTGG 
AAAATATGCAATCCAACTATAGATGCAGAAGCATATGCAATTGGGAAGAAAGAGAA 
ACCTCCTTNTTACCAGAGAGCCATCTTNTTTCT 

35 Sequence ID 653 

GTTGTGACTCGTTGGCATGTGATCTGAAGTTCCTGCCCTGCAGCTGACGAGCCAGT 

GTTTCAATAATTAAAAAGAACTCAACTCACT^ • 
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TGCGCTTTGCATGTATGTATCACAAT^ 

ATTATGTGTCAA.TAAAAAACAAAAATTAAAATCCCAATTTTTA 
Sequence ID 654 

5 GTTGCTAGTAGCGGCAGGAAGATGTCAGGCTCACTTTCCTCTGATTCCCGAAATGG 
GGGGAACCTCTAACCATAAAGGAATGGTAGAACAGTCCATTCCTCGGATCAGAGAA 
AAATGCAGACATGGTGTCACCTGGATTTTTTTCTGCCCATGAATGTTGCCAGTCAG 
TACCTGTCCTCCTTGTTTCTCTATTTTTGGTTATGAATGTTGGGGTTACCACCTGC 
ATTTAGGGGAAAATTGTGTTCTG 

10 

Sequence ID 655 

GTCCCCGGGAATCGCGGCCGCGTCGACGGTTTATTTTCAGTGCTTGAAGATACATT 
CACAAATACTTGGTTTGGGAAGACACCGTTTAATTTTAAGTTAACTTGCATGTTGT 
AAATGCGTTTTATGTTTAAATAAAGAGGAAAATTTTTTGAAAAAAAAAAAAAAAAA 
15 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^y^ 
AAA&AAAAAAAAAAATTTTT 

Sequence ID 656 

TAGAGGCCTGAATAGGTAGACAATGGCAGCAGCGTTTTTAATCACAGTCCTATTCA 
2 0 TGCCCTAATTCGGGAGTGATGATTAAAGGACATTAGAGGGAGCACTTTGACATCTG 
ATCCTTTGAACTGACGTCTGTGCAGGCTGCACTCCATAGAGCTCACTTGGCCAAAC 
TGATTTCCTTAAATAAAGTGCTGTGATTTCCAATGTAGGAAATATTACATTAGAGC 
CTATTGAAATGATTAGGAATTGAGGAGCTTTTCTTTAGGTGGGAATGTGGTGTATG 
CTGTATACTCACAAAAGTGAGATCATTAATATTGCATGTACTACTTTGAATATCAG 
2 5 GGACCACAGAGAAATAGCATGAGAAACGCCTTCCT 

ATATGAACAAAAATGTGGAACTCTGCTGTCATAGCTCTCCG 



Sequence ID 657 

GGGGCTTCTGCTGAGGGGGCAGGCGGAGCTTGAGGAAACCGCAGATAAGTTTTTTT 
3 0 CTCTTTGAAAGATAGAGATTAATACAACTCTTAAAAAATATAGTCAATAGGTTACT 
AAGATATTGCTTAGCGTTAAGTTTTTAACGTAATTTTAATAGCTTAAGATTTTAAG 
AGAAAATATGAAGACTTAGAAGAGTAGCATGAGGAAGGAAAAGATAAAAGGTTTCT 
AAAACATGACGGAGGTTGAGATGAAGCTTCTTCATGGAGTAAAAAATGTATTTAAA 
AGAAAATTGAGAGAAAGGACTACAGAGCCCCGAATTAATACCAATAGAAGGGCAAT 
3 5 GCTTTTAGATTAAAATGAAGGTGACTTAAACAGCTTAAAGTTTA 

Sequence ID 658 
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GACCTTTGAGAAAATTAATTTAAATCCTAGAACTTTGGGTGAACCGAAGAAATTTA 
TAATATTTGTTTAGTTAATAACAGATAAAAAGGAAAGATTCAAGCCTATTGGATGA 
GAATTTGTACATTATTTTAGAGCTAATAATAATGGTTTTCAGTTTAGTGAGGATTT 
AAAAAATGTTTTTGAATCAAACTTTTTTTCTTTATAATCCTTTTTAACTAACTCAG 
GAAATAAGGTATTATGAAATCCACACACTGTTACCTCCTTAAAGTATGAGGATACT 
TCCCACTGTTTGGTCCACTAGTGGCTGATTATTTTGTTTGTGGATTATTTGTAATT 
TTCTTTTTAATTCTTCCTTAAAGAGCATGGCATTTGGAGTCACAGACCTATATTTG 
AATCCTGTCATTTACTAGCGTTTTGACCTTGAACAATTATGCTCAGAGTCTCAGTT 
TTTTCTTGTAAAGTGATGATGATACTACTTAACTCACAGGGTTGTAGTGAAGATCA 
AATGAGATCATGTCTGTANAACACCCTGCCCGGCACTCAATAAGTATTAATAGGAA 

CCCATATACCTC 
Sequence ID 660 

TGTTTTTATTTTTTAAAAGGTATAAACACCAAAAZ^AAA^TTJ^ACATTGTATGAAG 
ATGGAAAATAAGAAGATGCACTTTCTGTAACTTTGTCTAAGGATTTAAATTACTAA 
CTTATGAACTCCAATTTGAATTGAACTTAACTATCGGCTTTCTTACTGGTAAAATT 
ATATGGTTTATTTTAAATGCGTACATATTGACCAATGGCCTCTGAAAAAGCACATT 
TTAGATACTGAAATTGAAGGAAAGAAAATGCATCTTCAAACATTTTTTGGAATCTC 
ACCACATATACTTTGTTANATTTGTGTATTGTAGGGTGTTTGTTTTGTATTTTTGT 
ATTGTATATGAACTTTTTTTAAATGTGACAGTTAAACACATCTTTAAAAGCATAGT 
CACAGACAAAAGCATACAGTATAAAAATTTCCTTGAAAACTCCTACAATATTATAT 
TTGGAGGCAGCTTCAGACTGTTTTATTGG 

Seqeunce ID 661 

CTCTGGCACACATTAGTTCCTCTTATATTACATTGATATAAGCAAGTCATATGGAT 
TTATCTGAGTGTAAGGAGAGCTGGAAAAAATAGTTTCTAGCAGGTCAGCCACCTCC 
CAGTGAGGGCTGCATACCATAGAAGGGGAGAATGAATTTTGGGAAAACAGGTAATT 
ATCTCTGTCACAGAAGGGGATGAAAAGTATGGTAGTTACNCAAGTTANACATCTGT 
ATGGAAAATACCACTTGGTTCTACAAATGNGG 
0 

Sequence ID - 663 nt : 627 

GCCTCCCGGGTTCAGGGATTTCTCCTGCCTCAGCCTCCTGAGTGGCTGCATTGCAG 

GCACCTGCCACCACGCCTTGCAAATTTTTGTGTTTTTAGTGGAGATGGGGTTTTGC 

CATGTTGGCCAGGCTGGTCTCGGACTCCTGACCTCAGGTGATCCGCCCGCCTCAGC 

CTCCCAGAGGGCTGGGATTACAGGCGTGAGCCACTGTGCCTGGCCCCAAGTTTTGC 

ATCTTTTAATGCCCTCTGAACAAATACATAGAGAAAACTCTCAGAACAATTAAAAC 

CTGCAGAGGAACAGTGTCCTCCATGTCTTAGGTTTCAAGTTTGCCTCTAAAATTCT 



WO 2004/046382 



PCT/GB2003/005 102 



- 201 - 

AATCCATATTTTTCTACTTCTCAGATAATTTATGTGTGTGTACTCTTCCTAGACGT 
ACAAGAGACTTTTTAATGCTAAATATTTGTCAGTGCTTAACAAAAACTCAATTTCA 
CATTACTCATATTGTTTTTGTTTTAATTGAATGTGAATTAAATTTTTATTAGTTAT 
♦ TTGATTTGGAATGTTATGTATGCCATTAACACTATTAGGGGAATCTCTAGCATTTC 
5 TGTATTTTTAAAGAATTTGATTCTTTTGTANATTCTGCCTGTGTGGCATTTTAAAC 
ATGTGTGACAT 

Sequence ID - 665 nt : 345 

ACCGGCGACATGGCCAAACGTACCAAGAAAGTCGGGATCGTCGGTAAATACGGGAC 
CCGCTATGGGGCCTCCCTCCGGAAAATGGTGAAGAAAATTGAAATCAGCCAGCACG 
CCAAGTACACTTGCTCTTTCTGTGGCAAAACCAAGATGAAGAGACGAGCTGTGGGG 
ATCTGGCACTG^GGTTCCTGCATGAAGACAGTGGCTGGCGGTGCCTGGACGTACAA 
TACGACTTCCGCTGTC^CGGTAAAGTCCGCGATCAGAAGACTGAAGGAGTTGAAAG 
ACCAGTAGACGCTCCTCTACTCTTTGAGACATCACTGGCCTATAATAAATGGGTTA 
ATTTATGTA 

Sequence ID - 666 nt: 252 

ATAATTCAGAACTTCTTCATATGCTCGAGTCTCCAGAGTCACTCCGTTCTAAGGTT 
GATGAAGCTGTAGCTGTACTACAAGCCCACCAAGCTAAAGAGGCTGCCCAGAAAGC 

2 0 AGTTAACAGTGCCACCGGTGTTCCAACTGTTTAAAATTGATCAGGGACCATGAJ^AA 

GAAACTTGTGCTTCAC CGAAGAAAAATATCTAAACATCGAAAAACTTAAATATTAT 
GGAAAAAAAACATTGCAAAATATAAAAT 

Sequence ID 669 

TTACTTTTAACCAGNGAAATTGACCTGCCCGTGAANAGGCGGGClSrrGACA.CAGCAA 
GACGAGAAGACCCTATGGAGCTTTAATTTATTAATGCAAACGGTACCTAACAAACC 
CACAGGTCCTAAACTACCAAACCTGCATTAAAAATTTCGGTTGGGGCGACCTCGGA 
GCAGAACCCAACCTCCGAGCAGTACATGCTAAGACTTCACCAGTCAAAGCGAACTA 
CTATACTCAATTGATCCAATAACTTGACCAACGGAACAAGTTACCCTAGGGATAAC 
AGCGCAATCCTATT 

Sequence ID 670 

GGCTGATTCCTGAGCTATAAAAGCATAATTGCTTTATATTTTGGATCATTTTTTAC 
TGGGGGCGGACTTGGGGGGGGTTGCATACAAAGATAACATATATATCCAACTTTCT 

3 5 GAAATGAAATGTTTTTAGATTACTTTTTCAACTGTAAATAATGTACATTTAATGTC 

ACAAGAAAAAAATGTCTTCTGCAAATTTTCTAGTATAACAGAAATTTTTGTAGATG 
AAAAAAATCATTATGTTTAGAGGTCTAATGCTATGTTTTCATATTACAGAGTGAAT 
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TTGTATTTAAACAAAAATTTAAATTTTGGAATCCTCTAAACATTTTTGTATCTTTA 
ATTGGTTTATTATTAAATAAATCATATAAAAATT 

Sequence ID* 671 

5 CAGGAAGTCACCTGGGATTGGCTGCCTCACCCACTCACAGTGCCATCCCTGCCCCA 
GGCCTCCCAGTGGCAATTCCAAACCTGGGTCCCTCCCTGAGCTCTCTGCCTTCTGC 
TCTGTCTTTAATGCTACCAATGGGTATTGGGGATCGAGGGGTGATGTGTGGGTTAC 
CTGAAAGAAACTACACCCTACCTCCACCACCTTACCCTCACCTGGAGAGCAGTTAT 
TTCAGAACCATTCTACCTGGCATTTTATCTTATTTAGCTGACAGACCACCTCCACA 
GTACATCCACCCTAACTCTATAAATGTTGATGGTAATACAGCATTATCTATCACCA 
ATAACCCTTCAGCACTA 

Sequence ID 672 

CAGGAAGTCACCTGGGATTGGCTGCCTCACCCACTCACAGTGCCATCCCTGCCCCA 
GGCCTCCCAGTGGCAATTCCAAACCTGGGTCCCTCCCTGAGCTCTCTGCCTTCTGC 
TCTGTCTTTAATGCTACCAATGGGTATTGGGGATCGAGGGGTGATGTGTGGGTTAC 
CTGAAAGAAACTACACCCTACCTCCACCACCTTACCCTCACCTGGAGAGCAGTTAT 
TTCANAACCATTCTACCTGGCATTTTATCTTATTTAGCTGACAGACCACCTCCACA 
GTACATCCACCCTAACTCTATAAATGTTGATGGTAATACAGCATTATCTATCACCA 
0 ATAACCCTTCAGCACTAGATCCCTATCAGTCCAATGGAAATGTTGGATTANAACCA 
GGCATTGTTTCAATANACTCTCGCTCTGTGAACACACATGG 

Sequence ID 673 

GGGTTTTCTTTCGGAAGCGCGCCTTGTGTTGGTACCCGGGAATTCGCGGCCGCGTC 
5 GACTGCTAAACAGAATACTGCTATTTTGAGAGAGTCAAGACTCTTTCTTAAGGGCC 

AAGAAAGCCACNTGNNCCCTNGGNCTAATCTGGCTGAGTAGTCAGTTATAAAAGCC 

NTAATNGCTTl^TNTTTGGNNTCNTTTTTNNCNGGGGNCGGNCTTGGGGGGGGTTG 

CNTCCAAAGATANCATNTNTTTC 

TTTCCNCCNGAAAANANNGCCCT^ 
0 NTTTTCTANTATNACAAANNTTTTNGTAGAANAAAAATTTTTTTTTAGNGGCTACC 

CTTTNTTTNTTANNC^^ 

CCACAACCTTGGGTCTNTAATNGGGGGGTTTTTAAATAAANCNTNTl^TAAATCCCC 

cnsnsNis^^ 

TCCCCCNCCCTTTTTCTTCCTGCCGGCCCCAATTTAAGCCCNGGCGCTTGGGGCAA 
ATCCCCCTTTAGNGGGGGGGTTTANAAAAACCNGGGGCGGGGNTTTAAAACCNCGG 

GGNNNGGGGAA 
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Sequence ID 674 

ACCTCTAGCATCACCAGTATTAGAGGCACCGCCTGCCCAGTGACACATG-TTTAAC 
GGCCGCGGTACCCTAACCGTGCAAAGGTAGCATAATCACTTGTTCCTTAATTAGGG 
ACCTGTNTGAATGGCTCCACNAGGGTTCACTTGTCTCTTACTTTTAACCAGTGAAA 
5 TTGACCTGCC 

Sequence ID - 675 nt: 591 

GTATAGAAAATAATGTCCCCAGNGCATAGAAAAAATGAGTCTCTGGGCCAGTGAAT 

ACAAAACATCATGTCGAGAATCATTGGAAGATATACAGAGTTCGTATTTCAGCTTT 

1 0 GTTTATCCTTCCTGTTAAGAGCCTCTGAGTTTTTAGTTTTAAAAGGATGAAAAGCT 
TATGCAACATGCTCAGCAGGAGCTTCATCAACGATATATGTCAGATCTAAAGGTAT 
ATTTTCATTCTGTAATTATGTTACATAAAAGCAATGTAAATCAGAATAAATATGTT 
AGACCAGAATAAAATTAATTATATTCTGGTCTTCAAAGGACACACAGAACAGATAT 
CAGCAGAATCACTTAATACTTCATAGAACAAAAATCACTCAAAA^ 

15 C/^AAGAATTCATGAAAAAGAAAGCCTTTGCCATTTGTCTTAGAAAGTTATTTTTTA 
AAAAAAAATCATACTTACTATTAGTATCTATGGAAGTATATGTAACAATTTTTATG 
TAAAGGTCATCTTTCTGTGATAGTGAAAAAATATGTCTTTACTAAGTTGAAATGAA 
TACTTTCTGNCTTTGCTAATGGATAGTTATT 

20 Sequence ID 676 

CTCAATTCTACTAAAAAGCCCCCCAAGAAAAGCGAATGAGAAAACAGAGTCATCCT 
CTGCACAGCAAGTAGCAGTGTCACGCCTTAGCGCTTCCAGCTCCAGCTCAGATTCC 
AGCTCCTCCTCTTCCTCGTCGTCGTCTTCAGACACCAGTGATTCAGACTCAGGCTA 
AGGGGTCAGGCCAGATGGGGCAGGAAGGCTNCGCAGGACCGGACCCCTAGACCACC 

25 CTGCCCCACCTGCCCCTTCCCCCTTTGCTGTGACACTTCTTCATCTCACCCCCCCC 
TGCCCCCCTCTAGGAGAGCTGGCTCTGCAGTGGGGGAGGGATGCAGGGA 

Sequence ID 679 

GNANCNTTTCCTNTCGNAAANCGCGCCTTGTGTTGGTACCCGGGAATTCGCGGCCG 
3 0 CGTCGACAAAAAAAAAAAAAAAAAAAAAAAAAAAAANTNTAGAC 

ATGCANGCNTGCGGCCGCAATTCGAGCTCGGCCGACTTGGCCAATTCGCCCTATAG 
NGAGTCGTATTACAATTCACTGGCCGTCGTTTTACAACGTCGNGACTGGGAAAACC 
CTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGT 
AATANCGAANAGGCCCGCACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGG 
3 5 CGAANGGAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTA 
AATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAA 
AAGAATAGACCGAGATAGGGTTGAGNGTTGTTCCAGTTTGGAACAANAGTCCACTN 



WO 2004/046382 



PCT/GB2003/005102 



- 204 - 

TTAAAGAACGNGGACTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCGATGG 
CCCACTACGTGAACCATCNCCCTAATCAAGTTTTTTGGGGTCGAGGNGCCGTAAAG 
CACTAAATCGGAACCCTAAAGGGAGCCCCCGATTTAAAGCTTGACGGGGAAAGCCC 
GGCGAACGTGGCGAAA 

5 

Sequence ID 682 

CACCTGCAGTCCAAGTAGATCGGCA.CGGGCCACGCCGACACCACCAAGTGGGAGTG 
• GCTGGTGAACCAACACCGCGACTCGTACTGCTCCTACATGGGCCACTTCGACCTTC 
TCAACTACTTCGCCATTGCGGAGAATGAGAGCAAAGCGCGAGTCCGCTTCAACTTG 
1 0 ATGGAAAAGATGCTTCAGCCTTGTGGACCGCCAGCCGACAAGCCCGAGGAAAACTG 
AAACTTTGCTTAACNACCGAATGGNGGGGANCTTTTCCAACGNTTTT 

Sequence ID 683 
TTGCTTTCATACTGNTGGGGNT^ 
1 5 ATCCCTAATGNGGCAATACTGAAAGGTGGGGCCTTTGAGATGTGATTGGATCGTAA 
GGCTGTGCCTTCATTCATGGGTTAATGGATTAATGGGTTATCACAGGAATGGGACT 
GGTGGCTTTATAAGAAGAGGAAAAGAGAACTGAGCTTGCATGCCC 

Sequence ID - 684 nt : 545 

20 GTGGAAGNGACATCGTCTTTAAACCCTGCGTGGCAATCCCTGACGCACCGCCGTGA 
TGCCCANGGAAGACAGGGCGACCTGGAAGTCCAACTACTTCCTTAAGATCATCCAA 
CTATTGGATGATTATCCGAAATGTTTCATTGTGGGAGCAGACAATGTGGGCTCCAA 
GCAGATGCAGCAGATCCGCATGTCCCTTCNCGGGAAGGCTGTGGTGCTGATGGGCA 
AGAACACCATGATGCGCAAGGCCATCCGAGGGCACCTGGAAAACAACCCAGCTCTG 

2 5 GAGAAACTGCTGCCTCATATCCGGGGGAATGTGGGCTTTGTGTTCACCAAGGAGGA 

CCTCACTGANATCAGGGACATGTTGCTGGCCAATAAGGTGCCAGCTGCTGCCCGTG 
CTGGTGCCATTGCCCCATGTGAAGTCACTGTGCCAGCCCAGAACACTGGTCTCGGG 
CCCGATAAGACCTCCTTTTTCCAGGCTTTAGGTATCACCACTAAAATCTCCAGGGG 
CACCATTGAAATCCTGAGTGATGTGCACTGATCAAGACTGG 

30 

Sequence ID 685 

GGAAAGGGCCATTTTATTGCCTAAAACCACCTGGNTTTTNAGGTAACAGTTCCAAC 
ATGTCCTTTTTTGAATAGCTGTTCTAATTATTATATATTCAGCTGATTAATAGGAG 
TACTTGATAGGTGGACTGTGTCAGGTAGCCTCAGGCAATCCTACTTCAACAAGCTG 

3 5 TCAGGGAGCCATGCCATGCTTCTTTATGACATAGGTGAATTTGATAGGCTCACTAG 

C AG AACAT GGG AT CAC AAGGT G GAAC CNT T C CNT T T 
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Sequence ID 686 

GACCCCTTCCTTACACCTTATAO^AAAAAACTGAAACTGGACCCCTTCCTTACACC 
TTATAC2\AAAATTAACTCAATTTTATTATGTTGTATTAAATTAAGTTGGGTTTAAT 
TAAGATGGATTAAAGACTTAATTATAAGACCTAAAACCATAAAAACCCTAGAAGAA 
5 AACCTAGGCCATACCATTCAGGACACGGGTATGGGCAAAGACTTCATAACTAAAAC 
ACCAAAAGCAATGGCAACGAAGTCCAAATAGACAAATTGGACCTGATTAAACTAAA 
GAGCTTCAGCACAGCAGAAGAGACTATCGTCAGAGTGAACAGGCAACCCACAGAAT 
GGAAGAAAATTCTTGCAATCTATCCATCTGACAAGGGGCTAATATCCAAAATCTAC 
AAAGAACTTAAACAAATTTACAAGGAAAAACAC AAAC AACC C CAT CAAAAAGTGGG 
1 0 CTAAGGATGTGAACAGACACTTCTCAAAAGAAAACATTTATGCAGCCAA 
GAAAAAAAGTTCATCATCACTGCTCATTAGAGACATGCAAATCAA 
GATCCCATCCCACACCAGTTAGAATGGCAATCATTAAAAATGT 

Sequence ID - 687 nt : ' 268 

TTTATGTGTTTTTGCTTGGGGGGCGCTGGGCCTAGCCCAGAGTAGTGCTTGCTCCC 
CCTGCCTTGTCCCACCAGGGAGGCAGCAGACTCAGGCCCTCCATGGTCCTCTTTGT 
CATTTTGTTGACATGCATTCCTCCTTTTGTCATCTTGTTGGGGGGAGGGGATTAAC 
CAAAGGCCACCCTGACTTTGTTTTTGTGGACACACAATAAAAGCCCCGTTTATTTG 
TAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 

Sequence ID - 688 nt : 569 

CTTTAGCCAGCCTGATCAGAAAAAAACAAAAGAAGAGGAAAGACGTAGATTACCAA 
CATCAAGAATGTGAGTTATGATATCACTACAGACTCTCCAGGTATTAAAAGCATAA 
TTAGAGAATGATATGAGCAGCTATATGCAAATAAGTTCAACATTGGACAAATGGAC 
AAATTTCTTGAAAGATAAATTATGAAATTTCATTCTGAAAGAACTACATGACCTTA 
ATTGTCTTACATCTATTAAATAAGTGGAAATTGTAGTTTAGAAACTTTCCCACAAA 
GAAAACTCTAGGCCCAGATGGCATCAAAATAATATTCAGATGAATGAAATGGAGAA 
AGGATAGCCTTTTCAACAAATGGTGGTGGAACAATTGGATTTCCATATGCAAAAAA 
ATAGAGATGGACGCAGAGGTGTGTGCTTAGGAGGCTGAGGTGAGAGGATTGTTTGA 
GGCGAGCCTGGGCAACATAGCAAGACCCCATTTCAAAAACAAAAATAAAGAACTTG 
TAGCCTTACCTTGTGCCATATTATGAAAATGTATCATAGGCTTAAATGTGAAAGGT 
AAAACAAAA 

Sequence ID 68 9 

3 5 CGCAGGGGCTTCTGCTGAGGGGGCAGGCGGAGCTTGAGGAAACCGCAGATAAGTTT 
TTTTCTCTTTGAAAGATAGAGATTAATACAACTACTTAAAAAATATAGTCAATAGG 
TTACTAAGATATTGCTTAGCGTTAAGTTTTTAACGTAATTTTAATAGCTTAAGATT 
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TTAAGAGAAAATATGAAGACTTAGAAGAGTAGCATGAGGAAGGAAAAGATAZV?^AGG 
TTTCTAAAACATGACGGAGGTTGAGATGAAGCTTCTTCATGGAGTAAAAAATGTAT 
TTAAAAGAAAATTGAGAGAAAGGACTACAGAGCCCCGAATTAATACTAATAGAAGG 
GCAATGCTTTTAGATTAAAATGAAGGTGACTTAAACAGCTTAAAGTTTAGTTTAAA 
5 AGTTGTAGGTGATTAAAATAATTTGAAGGCGATCTTTTAAAAAGAGATTAAACCGA 
AGGTGATTAAAAGACCTTGAAATCCATGACGCAGGGAGAATTGC 

Sequence ID 690 

CGAAAAGCAAATATAACTTGCCACTAACCAAGATCACCTCTGCAZ^uAAGAAATGAA 
10 AACAACTTTTGGCAGGATTCTGTTTCATCTGACAGAATTCAGAAGCAGGAAAAAAA 
GCCTTTTAAAAATACGGAGAACATTAAAAATTCGCATTTGAAGAAATCAGCATTTC 
TAACTGAAGTGAGCGAAAAGGAAAATT ATGCTGGQGCAAAGTTTAGTGATCCACCT * 
TCTCCTAGTGTTCTTCCAAAGCCTCCTAGTCACTGGATGGGAAGCACTGTTGAAAA 
TTCGAACGAAAACAGGGAGCTGATGGC^GTACACTTAAA^CGCTCCTCAAAGTTC ^ 
15 AAACTTAGATTTCAGATTT 

Sequence ID 691 

CCGGTCTCTACACAATATATAGAAATCTGGGCATGGTGGTGCCTGGCTGTAGTCTC 
AGCTACCTAGTTGGGTGAGGTGGGAGAGTCGCTTGAGTCCTGGAGGTTGAGGCTGT 

2 0 AGTGAGCCAGGGCTGCACCACTGCATTCCAGCCTGGGTAACAGAGTGAGACCCTGT 
CTCAAAAAGAAAAAAT^AAAATTGCTAATTTTAACAAATCACAAAACTGACTCAGGC 
AAGTTGTCTGACTGAAAAGCCCTTGAAAAACCATCAAAGACAGTAGAATGTTAACT 
GGTCATTTACGTAAAATAGTGTTCATTAAATTTTTGGTTCATTTAGGATAATCATT 
TTAAATGAGACTGTATTTGAGACTGTATACACATACATATACATGTTTACACACAT 

25 ATACGTACAATATATGTACATTCTATCTAAAAGATCATACATGTGTGTACATATAT 
GTTTTTAAAAGTCAAACTGACATATTAATGGAAACAGTGCTTACATCTCTGGTAGT 
GATTTTCTATTAGCAGCAGCCCTACATATGCTGCGTCTCTGAACAGCATGTCAGTG 
CCATGACTGTCTAAACATGCAAATATGACTGACAGACTCTTGAGACAGCTTTCACC 
TTG 

30 

Sequence ID 692 

AATTCGNGGCCGCGTCNNCCTANGAGGCACCAGGAAATCCCGCGGGGTGGCCCATG 
CAGACCAGGCGC^CGTGGCTCATGGGGCANAATTGCCAAGGACAGCTCACGACAG^ 
GCCACCTTCTCACCATTCCAGCCAAGGAGAGATGTGACGTTGGAACTGCTCTGGCA 
35 CTTCTGTCAAGCCTCCCCCGCCCCAATTGCCTTGAGATCTCTGCTCTTTGTCAGAG 
ATTTGCAJ^AGACTCACGTTTTTGTTGTTTTCTCATCATTCCATTGTGATACTAAGA 
AACTAAGAAGCTTAATGAAAAGAAATAAAATGCCTATGTTGTTGTTCT 
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Sequence ID 693 

CTAGAACCCATGACTCCTAGGTCTTATACTGCAACCACAGTATCAGCAAATAATCT 
TTCATAAGGGGATTATTCTCTGATTAACAGGAAATACAGGAATTTAATTTGTGAAC 
• ACGCTAGGTAGAAGGAGAAACCCAAATC 
5 ATTCTATAACTAAGATCTAACAGTCATTTTCTTCCCAGTAAGAAATAACCAAAGCA 
TGCTAAAAATCACTGGACTAAATTGGTGTCAAAACTGCC^CATTGCCAGGCATGGG 
GGGGTCATACTTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGAAAATTGCTTGA 
GGCCAGGAGTTCGAAACCAGCCTGGGCAACACAGTGAGACCCCATCTCCACAAAAA 
AAAAAAATTAAAAAACAAAACAAAAG^ 
1 0 GTCCCAGCTACTCAGGAGCCTGAAGTGAGAGGATCACTGAAGCCCAGGAGGTAGAG 
CTATGACTGTAGTGAGCTATGACTGTGCCACTACACTCCACCTGGGTGACAGGGGA 
CTC 

Sequence ID 6*94 

CGACTTCCATTTGTAT^FAATGGAATACTAAGTCCCTCTGTGATTTCTGAACCAAGC 
TATTCCTAGGCCTGAGTTTTATTTTGTTGACACAGAAATAAATTANAAGGCCAAGC 
GTGGTGGCATGTGCCTGTAGTCCTAGTTGCTGAGGTAAGAGGATTGCTTGAGCCCA 
GGAGTTCAAGGCTGCAGCAAGCTTTGATTGCGCCACTGCACTCCAGCCTTGGCGAC 
AGACTAAGACGCTGTCTCAAAAAAAAACAAAAA 

Sequence ID 696 

GGTTATCAATGAGATTAAGAGACAACTAGAGTAAAAAGAAAAGAAZ^AGAAAAGAAA 
NGAA2\ACAACAGAAGCTCTATTAACTGACCTCTAACCAATACAACAGGTTAACTGA 
TGTTCTCCATTCTGTATATAAAAATCC^^ 

CTTGTAGGACACTTTCTAGTTCATCTGAGCACTTTTGTTCTCAGCAGTTGAGCTGT 
ATACTTAGCAACATTTGGTGCTTCCAAACCCATTTGTGCCTGTAGCACTTACTATT 
GAAATACATAATTTAATTAAATATTATATAAAGGAATGGAATACGAGTTGGACAAG 
AAAAAGAGTTAAATCTGAAGGTTAGGTAAAAAGAGCAACTTCTTTTCTCTGTTTTG 
CAGGTTGGCAAAATCATTTAAAAACAATTGGAAGTATTATATGTTCTGCATTAAGT 
TGTCATTTTACTTAAAAACTAGGCATCAAAGATGATGCATAATAAATTTAGTGTAT 
GCAAGAATGACTGCTTGGGACCTCAATATATGAATTCTTAATCCAAGGAAAGTCCT 
TGGCCTTACATTTAAAAGTCGGCAAATAAGTGTACGTTCATT 

Sequence ID 697 

35 GAACATTTAAAAATAATGCAAATAAGGCTGGGCGTGGGGGCTCACACCTGTAATCC 
CAGCACTTTGGGAGGCCGAGGCAGGCAGATCACGAGGTCAGGAGATTGAGAGCA.TC 
CTGGCTAACACAGTGAAACCCTGTCTCTACTT/y^AAAATAAAAAAATTAGCCAGGC 



15 



20 



25 



30 



WO 2004/046382 



PCT/GB2003/005102 



- 208 - 

GTGGTGGTGGGCGCCTGTAGTCCCAGCTACTCAGGAGGCTGAGGCAGGAGAATGGT 
GTGAACCCGGGAGGCGGAGCTTGCANTGAGCTGAGATCGTGCCACTGCACTCCAGC 
CTGAGCGACAGAGCGAGACTCTGTCT 

»• 

5 Sequence ID 698 

TCATTAGAATCCAAGCTTTGAAAATTTCTGATTAATGCTCATGTATTTCTTTATCT 
TTGTTTTTCCTTGTGAAGAAAGACTTTCACCACTGTCTGAGTGATGATGCTGTTGA 
TAAGGATGATGTCGATGACTACTATATTGCATCTCTCAGGAACAGCTGATGGGAAG 
GGAGGGGCTGCTGAGTTCCCTTGTTCTAGCTAGCAGCACGCTCCTCANAGAGGGGG 
1 0 CCGAGTTACAGACAGCAGCCGCATTCTCATGCAAAATTAGTTTTAAACTGCTAGTG 
TGGGCATCGGTACCTTTTGCCTGGGTGATACCGAAGAATTGTTGAGGATTTAGTAir 
GCTCCGTAGAGACAG^TC^GCCAGTCATTTCTGCATTGGAGAGACTTCTCATAdTT 
TCTTTGAAGACTCATAGAAAGCTGGAT 

15 Sequence ID 699 

ATTAAGGTTTGTNCCCAACAAGAATAGATGTAATTAGAAAAAANTGNCTTCCTTAC 
CTATTGCCTCTGATNTTTACTTGCTTAAATTTTTTTTATTGNAAATCCAGAAAAAG 
NGGATTTAGAGAACAACACTAACTCCCACCTAATCTATGACAGANATGTACAANAN 
AGTACCTGTGAAAAATGTGAAAGNATNTGAAAAATGTAACCTTTGGCAGCCTGAGC 

2 0 ATAGTCAACCAGAAAAACTATCTGAATTAAAATAATTGGTCCATAGGTACTATTTT 

TTTCCCACGTAGGTTACTGATACCTGAAGACTTTTTNCACCTTTAACCTTNCTCGT 
TGAGGAGCTTTGTANTCTAATAAAAGAGAAATATAAGTAAATGTTAGATATATGGG 
NGGATAATGGTAACTATGTGCTTAAAGAGGTATAAAAGAAGGGTAGGGAGCAGATA 

2 5 AGACAAAGGAAGGGCTATATTATAANGAAGAATATTCCAAGTAGGGAAGAGAAAAA 

GATATGTTATCCATATAATATTTTATGTGCAGTAGAGAACATGTTCTATAGAANAG 
ACAGAAGATG 

Sequence ID 700 

3 0 CTTGAGCCCAGGAATTCCAGCCTGGGCAATATAGTAAGACTCCGTCTCTACAAAAG 

ATACAAAAATTAGCCAGATGTGGTGGTGCGTGCCTGTAGTTCCAGATACTGGAAAG 
ACTGAGGCAGGAGGATTGCTTGAGCATGGGAAGTTGAGGCTGCAATGAGCTGTGAT 
TACGCCACTACACTCCAGCCTGGGCAACAGAGTAAGATCTTGTCTCAAAAAAAAAA 
TTGAATTCAGCTAAAAATAATAAAATTTTAAAATAATTTTAAAAAGCCCTCAACAG 
3 5 CTTTGTTTTTCTCTCCTTGCCAGCTTCTCTGCAGCCTATAGCCTGCAGGCTGGCTG 
CTGCGAGCCAGGACAAGCGGTGGGAAATGCAATCACAGCGTGAAATCTCTGTGTTC 
AGAGACACGCAGGAAGCAGGTGAACCATGAAGGGCC!AAC^lCATGCCCCCAGTTAGC 
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AGGGTGTAGAGACCGGGGCAGGGCTTTCTTCTTCCTTCTGGGTTATAAATATCCAT 
GTCCTGCCATTTGAAGCTGCAAGTGGCACACATGGATGCTGGACAGGCGCTCGCAC 
TTTCTGGGCAGGGCANGGGGCTCAAAGGCAGGACAGCTGGGCAAAAGCACCTTGCG 
TGGGCCC 

5 

Sequence ID - 701 nt : 579 

CTTTGGAGCTTCTGTCTGTGCTGTGGACCTCA^TGCAGATGGCTTCTCAGATCTGC 
TCGTGGGAGCACCCATGCAGAGCACCATCAGAGAGGAAGGAAGAGTGTTTGTGTAC 
ATCAACTCTGGCTCGGGAGCAGTAATGAATGCAATGGAAACAAACCTCGTTGGAAG 

10 TGACA2^ATATGCTGCAAGATTTGGGGAATCTATAGTTAATCTTGGCGACATTGACA 
ATGATGGCTTTGAAGGTAATTAAAATTATCAAATTGGTGCTTGATTTCTGCTTTTA 
AAATGGTTTATGGAAGAAJ^ATATGATTAAAGTTTTGTATTGTTTTCCTTCCTATAG 
AAGATGGAGCGAGAATGGCATGCTAAGTTTTTTCTTTTCTTTAGTGTTATATATGA 
CTTCTCCTCAATTGTCACCCATTGATCTTTACCACTGTTAATAATGGATGATATTC 

15 AAAATACCTTATTTCAGTGATTCTAAGGGACCATTGATTAGAAACTGCATTATTAT . 
TTATGTGTCCCTAAAAGCTACCTATTAAGCTGTTACACCCACCATTTTTCTGTTAA 
GAAAATCCTGATTTCAGAA 

Sequence ID 702 

2 0 GTNNTCCTCTCGGAACGCGCCTTNTGTAGCCAGGTGCTACCAGACCNAATACACGG 
TTGTTCCAGCTTGCGCATTCACCGATGGCGTAGATATCCGGATCGGAAGTCTGGCA 
GGAATCATTAATGACAATACCCCCACGCGGAGCAACGTCCAGACCACACTGGGTTG 
CCAGCTTATCGCGCGGACGGATACCGGTAGAGAAGACGATAAAGTCGACTTCCAGT 
TCGCTGCCGTCGGCAAAACGCATGGTTTTACGCGCTTCAACACCTTCCTGCACAAT 

2 5 CTCAAGGGTGTTTTTGCTGGTGTGAACGCGCACGCCCATACTTTCGATTTTGCGAC 
GCAGCTGCTCGCCGCCCATCTGATCAAGCTGTTCTGCCATCAGCATAGGGGCAAAT 
TCGATAAC 

GTGGGTTTCAATACCTAAGTTTTTCAGCGCGCCTGCGGCTTCCAGACCTAACAGGC 
CGCAATTCGAGCTCGGCCGACTTGGCCAATTCGCCCTATAGTGAGTCGTATTACAA. 
30 TTCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCAAC 
TTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAAGAGGC 
CCGCACCCGATCGCCCTTTCCAACAGTTGCGCACCTGAATGGCGAATGGAAATTGT 
AAGCGTTAATATTTTGTTAAAATTCGCGT 

35 Sequence ID 703 

CTGCGCAGACCAGACTTCGCTCGTACTCGTGCGCCTCGCTTCGCTTTTCCTCCGCA 
ACCATGTCTGACAAACCCGATATGGCTGAGATCGAGAAATTCGATAAGTCGAAACT 
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GAAGAAGACAGAGACGCAAGAGAAAAATCCACTGCCTTCCAAAGAAACGATTGAAC 
AGGAGAAGCAAGCAGGCGAATCGTAATGAGGCGTGCGCCGCCAATATGCACTGTAC 
ATTCCACAAGCATTGCCTTCTTATTTTACTTCTTTTAGCTGTTTAACTTTGTAAGA 
TGCAAAGAGGTTGGATCAAGTTTAAATGACTGTGCTGCCCCTTTCACATCAAAGAA 
5 CTACTGACAACGAAGGCCGCGCCTGCCTTTCCCATCTGTCTATCTATCTGGCTGGC 
AGGGAAGGAAAGAACTTGCATGTTGGTGAAGGAAGAAGTGGGGTGGAAGAAGTGGG 
GGTGGGACGACAGTGAAATCTAA 

Sequence ID 704 

10 CTTGTATTCAAGAACTACTGTAATGCATTAGTGGTCTGGCTTCATTTTGTATGATG 
CCAGATCCTTAATTTACCCAGCACAATCATTTCAGTAGTTTCCTATGGCTCCTGCA 
AAAATGCAAACAGAAACCACCACAGGAACAGCCCCTTGCTGCCTCCTGTTGCTGAG 
GTAGTAGTCGCTAAAGAAAATTGAAGGCTCCTTACAATCTATATTTGAAAACTAGA 
ACTTCTGTAGAAACACACAGATCCCGATCTTAGAAGTTGTACAGGACAATCTGGT 

15 ~ AAACTGACATAATTGTGATTTATTAACATGAATTAAAATGCCCAACCAGTGCTTCA 
GTGTGACAGTATATTTAAAATAAAAAAGAAATTAAAGGTCATATACTGTACTACTT 
TC^CAAAGATCCACAGTTTTGGAAAA 

AATGAGAAAAGCTGTAAGCAATTATATACGCAAAAGAAATGGCAGTA 

20 Sequence ID 705 

TTCC^GTCCTTTCATTTAGTATAAAAGAAATACTGAACAAGCCAGTGGGATGGAAT 
TGAl^GAACTAATCATGAGGACTCTGTCCTGACACAGGTCCTCAAAGCTAGCAGAG 
ATACGCAGACATTGTGGCATCTGGGTAGAAGAATACTGTATTGTGTGTGCAGTGCA 
CAGTGTGTGGTGTGTGCACACTCATTCCTTCTGCTCTTGGGCACAGGCAGTGGGTG 

2 5 TAGAGGTAACCAGTAGCTTTGAGAAGCTACATGTAGCTCACCAGTGGTTTTCTCTA 

AGGAATCACAAAGGTAAACTACC(^^CCACATGCCACGTAATATTTGAGCC^TTCA 
GAGGAAACTGTTTTCTCTTTATTTGCTTATATGTTAATATGGTTTTTAAATTGGTA • 
ACTTTTATATAGTATGGTAACAGTATGTTAATAGACACATACATATGCACACATGC 
TTTGGGTCCTTCCATAATACTTTTATATTTGTAAATCAATGTTTTTGGAGCAATCC 
30 CAAGTTTAAGGGAAATATTTTTGTAAA 

Sequence ID - 706 nt : 496 

CAACCCTCTCTCCTCAGCGCTTCTTCTTTCTTGGTTTGATCCTGACTGCTGTCATG 

GCGTGCCCTCTGGAGAAGGCCCTGGATGTGATGGTGTCCACCTTCCACAAGTACTC 

3 5 GGGCAAAGAGGGTGACAAGTTCAAGCTCAACAAGTCAGAACTAAAGGAGCTGCTGA 

CCCGGGAGCTGCCCAGCTTCTTGGGGAAAAGGACAGATGAAGCTGCTTTCCAGAAG 
CTGATGA6CAACTTGGACAGCAACAGGGACAACGAGGTGGACTTCCAAGAGTACTG 
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TGTCTTCCTGTCCTGCATCGCCATGATGTGTAACGAATTCTTTGAAGGCTTCCCAG 
ATAAGGAGCCCAGGAAGAAATGAAAACTCCTCTGATGTGGTTGGGGGGTCTGCCAG 
CTGGGGCCCTCCCTGTCGCCAGTGGGCACTTTTTTTTTTCCACCCTGGCTCCTTCA 
ACACGTGCTTGATGCTGAGCAAAGTTCAATAAAGATTTTGGGAAGTTT 

5 

Sequence ID - 707 nt: 397 

CGGATGTGGTGGCAGGCGCCTCTAGTCCCAGCTACTCGGCAGGCTGAGGTAGGAGA 
ATGGCTTGAACCCAGGAGGTGGAGCTGACAGTGAGCCGAGATCGCGCCACTGCACT 
CCAGCCTGGGCGGCAGAGCGAGACTCCATCTCAAAAAAAAAAAAAAAAAAAATAGA 
10 CTTTGAGACCAGCCTGACCAACATAGTGAAACCCGTCACTACTAAAAATACAAAAA 
TTACCCGGGCGTGGTGACGGGCGCCTGTAATCCCAGCTACTTGGGAGGCTGAGACA 
GGAGAATCACTTGAACCAGGGAGGCGGAGGTTGTAGTGAACTGAAATCGTGCCCCT 
GCACTCGAGCCTGGGTAACAAGAGC(^^ 
AAAAT 

15 

Sequence ID - 708 nt: 293 

CCAGCTTTTTATGGTGTTTAATCTAATACACTTAAGCTGCAGTCCCAAAATTAGGG 
GTCCTTCAGTCTTGGAGACTATAAGGGAGCCTCTGCACCCAGGGAAAATGTTACCC 
TTTACAGGGGGGAAGGGTAAACCAGTAGGGAATA 
2 0 GGAGGGGCGGGAGGGAGGTGTTGCCGTCACTGTATTAAGTCGATGTTGGGAAACGT 
TTTAACATCTGGAGCCTTTGTGGGTGGAAATATGTCTCCAGTTACAACTCCGCAGT 
GGATGTGAAGAAG 

Sequence ID 709 

2 5 GGAAGCTACAATGATTTTGGGAATTACAACA^ 

GAAGGGAGGAAATTTTGGAGGCAGAAGCTCTGGCCCCTATGGCGGTGGAGGCCAAT 
ACTTTGCAAAACCACGAAACCAAGGTGGCTATGGCGGTTCCAGCAGCAGGAGTAGC 
TATGGCAGTGGCAGAAGATTTTAATTAGGAAACAAAGCTTANCAGGAGAGGAGAGC 
CAGAGAAGTGACAGGGAAGCTACAGGTTACAACAGATTTGTGAACTCAGCCAAGCA 

3 0 CAGTGGTGGGAGGGCCTAGCTGCTACAAAGAAGAC^ 

GTGTATGGGCAAAAAACTCGAGGACTGTATTTGTGACTAATTGTATAACAGGTTAT 
TTTAGTTTCTGTTCTGTGGAAAGTGTAAAGCATTCCAACAAAGGGGTTTTAATGTA 
NATT 

35 Sequence ID 710 

TGGATTCCCGTCGTAACTTAAAGGGAAACTTTCACAATGTCCGGAGCCCTTGATGT 
CCTGCAAATGAAGGAGGAGGATGTCCTTAAGTTCCTTGCAGCAGGAACCCACTTAG 
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GTGGCACCAATCTTGACTTCCAGATGGAACAGTACATCTATAAAAGGAAAAGTGAT 
GGCATCTATATCATAAATCTCAAGAGGACCTGGGAGAAGCTTCTGCTGGCAGCTCG 
TGCAATTGTTGCCATTGAAAACCCTGCTGATGTCAGTGTTATATCCTCCAGGAATA 
CTGGCCAGAGGGCTGTGCTGAAGTTTGCTGCTGCCACTGGAGCCACTCCAATTGCT 
5 GGCCGCTTCACTCCTGGAACCTTCACTAACCAGATCCAGGCAGCCTTCCGGGAGCC 
ACGGCTTCTTGTGGTTACTGACCCCAGGGCTGACCACCAGCCTCTCACGGAGGCAT 
CTTATGTTAACCTACCTACCATTGCCCTGTGT 

Sequence ID - 711 nt: 498 

10 GTGGTACATATACACAAAGGAAAACTATGTAGCCATTAAAAGAAAAGGAACTCCTA 
TCATTTGTAACAACATAAATAAATCTGGAGGAGATTAGGCTAAGGTGAAATAAGCC 
AGGCACAAAAAGACAACTACCATATGATCTTACTTATACGTGTGTGGAATCTAAAA * 
AGGTGGAATTTACAGAAGCAGAGAGTAGAATGGTGATTACCAGAGGCTGGGG 
AGGGCAGGAGGTTGGAGAAATGTTGGTCAAAGGATACAAAGTTTCAGTTATACAGG 
15 ATGAATAAGTTCAAGAGATCTATTGTACAACGTGGTGGCTATAGTTGATAACAATG 
TATTGTGTTCTTGAAAAATGCTGAGAGAGTAGATTTTAAGTGTTCTCACCAC3AAAA 
CATAAGTATGTGAGGTAATGCATGTGTTAATTANCTTAATTTAGACATTTCATAAT 
GTATTATACATATTTCAAAACCACGTTGTAGATGAGAAAGATACACAATT 

20 Sequence ID 713 

GCCCAGTCGACCCATGTTCTCCTTTCTACACCAGCATTAGACGCTGTCTTCACAGA 
TTTGGAAATCCTGGCTGCCATTTTTGCAGCTGCCATCCATGACGTTGATCATCCTG 
GAGTCTCCAATCAGTTTCTCATCAACACAAATTCAG 

GATGAATCTGTGTTGGAAAATCATCACCTTGCTGTGGGTTTCAAACTGCTGCAAGA 

2 5 AGAACACTGTGACATCTTCATGAATCTCACCAAGAAGCAGCGTCAGACACTCAGGA 

AGATGGTTATTGACATGGTGTTAGCAACTGATATGTCTAAACATATGAGCCTGCTG 
GCAGACCTGAAGACAATGGTAGAAACGAAGAAAGTTACAAGTTCAGGCGTTCTTCT 
CCTAGACAACTATACCCGATCGCATTCAGGTCCTTCGCAACATGGTCACTGTGCAG 
ACCTGAGCAACCCCACCAAGTCCTTG 

30 

Sequence ID 714 

CTGTAACAGAGATTCCTTTTTTCAATAATCTTAATTCAAAAGCATTATTAGACTTG 
AAAGGGTTTGATAATCTCCCAGTCCTTAGTAAAGATTGAGAGAGGCTGGAGCAGTT 
TTCAGTTTTAAATGAGTCTGCAGTTAATATCAAATGTGAGTTTGGGACTGCCTGGC 

3 5 AACATTTATATTTCTTATTCAGAACCCTTGATGAGACTATTTTTA2VACATACTAGT 

CTGCTGATAGAAAGCACTATAGATCCTATTGTTTCTTTCTTTCCAAAATCAGCCTT 
CTGTCTGTAACAAAAATGTACTTTATAGAGATGGAGGAAAAGGTCTAATACTACAT 
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AGCCTTAAGTGTTTCTGTCATTGTTCAAGTGTATTTTCTGTAACAGAAACATATTT 
GGAATGTTTTTCTTTTCCCCTTATAAATTGTAATTCCTGAAATACTGCTGCTTTAA 
AAAGTCCCACTGTCAGATTATATTATCTAACAATTGAATATTGNAAATATACTTGG 
CTTACCTCTCAATAAAAGGGTCTTTTCTATT 

5 

Sequence ID 717 

TCCACCCACCTTGACCTCCCAAAGTGCTGGGATTATAGGCGTGAGCCACCTCGCCC 
AGCCCGATACTAGGACTTATGCAGAAAAAACCTTGACATGGAGGAAAGTAAGATCT 
AAATAAATACTGTATTCATAGATTAAAAGACTC^ 
10 CCCCAGATTGATGTACAGATTTAACACAATT 

TGTAGATATGTAAAAGATOATTCAAAAATGTAAAAGGAAGGACAAAGGACTAGAAT 
AGATAAAACAAAATGGAGAAAGATTTAATAGGAATCACTGTAACTGATTTTAAGAC 
ATACAGAACAATAATAGAAACTGCTTGTATT^ 

GACATACCTGAGATTGGCAATTACAAAGGAAAGANGTTTATTGGCTTACAGTTCCC 
15 ATGGCTGGGGAGGCCT 

Sequence ID 718 

CTCCTCTGGGTTGAAACCCGGGCGCCGCCAAGATGCCGGCTTACCACTCTTCTCTC 
ATGGATCCTGATACCAAACTCATCGGAAACATGGCACTGTTGCCTATCAGAAGTCA 

20 ATTCAAAGGACCTGCCCCCAGAGAGACAAAAGATACAGATATTGTGGATGAAGCCA 
TCTATTACTTCAAGGCCAATGTCTTCTTCAAAAACTATGAAATTAAGAATGAAGCT 
GATAGGACCTTGATATATATAACTCTCTACATTTCTGAATGTCTGAAGAAACTGCA 
AAAGTGCAATTCCAAAAGCCAAGGTGAGAAAGAAATGTATACGCTGGGAATCACTA 
ATTTTCCCATTCCTGGAGAGCCTGGTTTTCCACTTAACGCAATTTATGCCAAACCT 

2 5 GCAAACAAACAGGAAGATGAAGTGATGAGAGCCTATTTAG^CAGCTAAGGCAAGA 
GACTGGACTGAGACTTTGTGAGAAAAGTTTTCGACCCTCAGAATGATAAACCCAGC 
AAGTGGNGGGCTTGCTTTGTGAAGAGACAGTTCATGAACAANAGTCTTTCAGGACC 
TGGACAGTGAAGGGAGCCCGGGCAGCCA 

30 Sequence ID 719 

CGNGGCCGCGTNAACTTTTGATCGTCAGCTGGGGCTGGCAGGCACCTAAATGGGAA 
GGGTGATAGCAGTGTGTTGGGGGGAGTTTAGGGAACGGTCCTCTACCGATAGAGGC 
AGCANCTCATTGGAATTTCCTCCTGAAGTTGTCTTGCCCCTTGAATCCTGCAGGAA 
GGCTGGCAAATGGCCATTTCCCTTCCACTTGAATAGAGACCCATAACTCAAGTATC 

35 TGCCCTTAAGACACCACAGGACTGTTCTTCGCGGGCCCTGCCCCTGGATTTGGGAG 
AGGCAGTCCANCTCACCCAACTAGGCTCTGCANGGGGACCANGAGGGATGGGTTGT 
GTCCACAGGACCAGCCAGACTGATGAGGGATGCGGCAAGCATATTCTCACCACCTT 
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CTTTCACGTTTACAACANACG^^ 

CGAGGACCTANAATCATGGAGTGCTCTGGGGATCCGGGCTTGGA 
Sequence ID 72 0 

5 TCAGTGTTGAATTTTGTCAGACACTTTCTCTGCATCAATTGGTATGACCATGTGAT 
TTTTTTTCTGTAGCCTGTTAATATGGTTAATTTTCAAATATTGAGCTGATTAATTT 
TCAAATATTGAGCTCTCCTTGCATCTCTGGAATAAGTACCACTTGGTCGTGGTATA 
TATTTCTTTTAATATATTGCTGAATTCTGTTTGATCATGTTTTCTTAAAGACTTTC 
GTGTCTGTTTTCATGATAGATACTGGTCTATAGTTTTGTTGTAATATCTTGGTTTG 
1 0 ATTTTGATATCAGGATAATGCTACCTTAATAGAATGAATTGGAGCCAAGTATGGTG 
GCAAATGCCTATAGTCCTAGCTACTCAGGAGGCTGAGGTGGTGGGGACTGCTTGAC 
CCANGAGTT CAAATCTAGCTTGGGCAATGTAGCAAGAC 

Sequence ID 721 

1 5 TAGAAGGAATGACTATTCATGTCCAAAGTGAATGGTTTTGTGCAGTGAACAACACA 
TGGCGAGGTACTAACTGAGAAACTTTTTCATGGTTTATGCCTACCTCTTGTAGTTG 
TTGCAGAGCAAATATAAATTGTAATAAGATAGCTAGGCCTTGCAGAAACAAACAGA 
AAAACTTAAAAAAAAATGATATAAGAGCTGGAGTCTAGTATTTATATGAATCTGTG 
AGAGATAATTTTTTTGGTCTCACTGCAATGAACCAAAAGCGGCTGAGTTTGGTTTT 

2 0 TAATTGTAGCCATGTATTGAAGGCATCTTTTTGACCAACTCTTGTTGGTTCTGTCT 

TGAACCATTGTTAATCACTGTGCTGTAATTAGTATAGCTAAATCTTTTCCTTCCTT 
GCTCCTCCCCCAGCCC^CCCCGTCTTCCCTTAACATTTTTTCAGGGGGGGTTGGGA 
GTGGTTTCATTTTAATGTGAGTGGATGTTTTGATAGTTGTAAGGAAAAAATGCATT 
TC^GACACATTTCACACATGAGCTATTTTCTTACACAGTATGTCTTATTGGTAATA 
25 AGAATGTAATTCAT 

Sequence ID 722 

CNTTCCNTAAGAATACAAAAAATTAGCTGGGCGTGGTGGCAGGCGCCTGTAATCCC 
ATCTACTCAGGAAGCTGAGGCTGGAGAATCGCTTGAACCCGGGAGGCGGAGGTTGC 

3 0 AGTGAGCAGAGATCACGCCACTGCAGTCCAGCCTGGGCAACAGTGCGAGACTCTGT 

CTCAAAAAAAAAATAAATAAATTACCTGGGTGTGGCAGCGCGTGCCTGTAATCCCA 
GCTACCCAGGAGGCTGAGGCAAGAGAACTGCTTGAACCCAGGAGGCAGAGGTTGCA 
TGGAGCTGAGATGGCGCCACTGCACTCCAGTCTGGTGACAGAGTGAG 

3 5 Sequence ID 724 

CTCTCTACTAAAAATACAAAAATTAGCTGGGCACGGNGGTGCATGCCTGTAAACCC 
AGCTACCAGGTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCAGGGAGTCGG 
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AGGTTGCGGCGAGCTGAGATCATGCCACTGCACTGCGGCCTGGAGACAAGAGGAAG 

ACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAG 

NGNGNGGACCTTATTTGGCTNTTAATTCAAACTATTAAAAATGTGAACN 

5 Sequence ID - 726 nt : 260 

CGGGGTCTGTACCGGGCTGGCCTGTGCCTATCACCTCTTATGCACACCTCCCACCC 
CCTGTATTCCCACCCCTGGACTGGTGGCCCCTGCCTTGGGGAAGGTCTCCCCATGT 
GCCTGCACCAGGAGACAGACAGAGAAGGCAGCAGGCGGCCTTTGTTGCTCAGCAAG 
GGGCTGTGCCCTCCCTCCTTCCTTCTTGCTTCTCATAGCCCCGGTGTGCGGTGCAT 
10 ACACCCCCACCTCCTGCAATAAAATAGTAGCATCGG 

Sequence ID 727 

CTGAGTNTAGAAATGATGCCATTAATACTGATTGCAAAAACATTACAACTCAGTAC 
TGCAGCTTTCATTCAAATAGGTTATATGTATAAACTGAGTTCAACAATATTGTATT 
1 5 TGAGATGGTAAAGTTAAAGAAATGCAATAATGTAAATAATACTT AAGAAAATAAGA 

TCTCAGGAAACTGTATATACTCTGTACTTTTATGGAA.CTTTATCAGATCATTTCAG 
TATATGCATCAAGGATATAGTGTATATGACATGAACTTTGAGTGCAAAAACTGTAC 
TATGTACCTTTTGTTTATTTTGCTGTCAACATCTAAATAAAGGTTTTTTTGTTTGT 

2 0 ATTGTTTTAATTAAACAATTGTTTAATTGTTTTAAAGTCGCCAGGCTGAGGCAGGT 
GAATCACAAGCTTAGGAGTTGGAGGCTAGCCTGCCAACATGGTGAAACCCCGTCTC 
TACTAAAAATACAAAAAAATTAACTGGGTGTGGG 



Sequence ID 728 

2 5 CCCATCTGCACCAGTACACAGGCAGGC 

GTGGCAACTTGGGATTCATTCTGGTGATTCTGAACCTTGCCTCATAGCTTAAAGTA 
TAAAAAAGATTCAAGAGCAGTGAGGTTTGTTCTTTCCAGTGAATGGTGGACTGAGT 
GGTGCGAGGTGGAGGGCTAACAAGAGGAAAGAACTACATTCTTCAGAATACAGTGA 
TGAAAATTCATTTTGAAACTCAAATATTTTCATTTTGGATATTCTCCTGTTTTTAT 

3 0 TAAACCAGTGATTACACCTGGCCATCCCTCTAAATGTTCTAGGAAGGCATGTCTAT 

TGTGATTTTGATGAAGACAGAATTATTTTTCTCTGTAGAAACACAGATACCACTTT 
ATCAGGGGAAGTTAGTCAAATGAAATGGAAATTGGTAAATGGACAAAAGCTAGCTA 
GTAAAAAGGACGACCCAGCAACATGCTTTAACCCCATTGTATGTTTGTGGAAAGAG 
CATAGTTTAACATCTTGAGAAATTTGGGACATAAAAGTTTTCATNGGTAGACAGTT 
3 5 CATGGCAGTATATGAATTGACATAATGGAAATAATCTGATTTTATTTTTACAACTA 
ACATCCTTTCCCC 
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Sequence ID - 73 6 nt: 641 

GGAATTCCAAGTGCTTGGGGATAATGATACCTCTGACCTTTCTTCCTTTTGGGAAG 
TACTTGAGTGTGCAGCTGCATGAGGCCTCAGCAGGAGAGAGATTTTAGGTCCAAGA 
AGCTATACCAGTAGGACAAGGCAGGAAAATACTACACTTTCAGGATCAAGCCCCTC 
5 TGACTCTCATTTGGAAACTGGATGTTTGCTAAGCACCTGCTTCTTAAGGATGCCGA 
GGGATTTAATGATACTCCCAGAAACCTGGAGAGATTAATGGGGCCTATGGAGAAGT 
GCTCTGAACTCAGTGTTGGGACTTGAATAAAATTAAC(^TTGT<^TGTTTT(^GAA 
CAACTAAGCTGTTTTATATTTCATGTGCATGAAAGCCCTAGAACTAAGTTGTGTTA 
TTTCCAGAAATGAAATAGATCCCACAGTTAGATGATGTGGCCmTTAGGAAGTACCA 
1 0 AATTTATAAAAATCACTGGAGGTCTGTCTGAGpAGTACCTAATAAAATATAGTATA 
CTGAAAGTGAACAGATACTTTGTCTCTTTCTTTGGCTGCTTGATCTTTATCTGTGT 
CTGCCGTACAGTGCACCCTTAAAGTATTCTACACCAGTGCTTCTCAAACTGGAAAT 
GTGCATGTAAGTCACCCANGGGTCT <? 

15 Sequence ID 739 

TGCATGCCCATAGTCCCAGCTATTTGGGAGGCTGAGGCAGGAAAATCGCTTGAACC 
CGGGAGCCAGAGGTTGCAGTGAGCCGAGATCGCACTCCAGCTTGGCGACAGAACAA 
GACTCTGTCTCAAAAAAAAAAAAAAAAGAAATCTTGGGATCCTGAACCCCTTACTC 
GAAGGGCTAAGGTAGCATCTCAGCATGTCTTATTCGAGACTTCGTANAACCAGACC 

20 TGCTGTTTGTAGATGTTAATTAATCAAACCTTTCTCTACTCATTCTGGACCAGTTA 
AGGTTTTCTCCTTCTCCGTATGAGTTTTGATTTTCGTCCTCCTTGGTTGGAGATCA 
CACTTTGGTCTGCTGCTAAGTTGGATGCCTCCCACTGTCTTTCCCTAAGTCTAGGG 
CTT(^^ACCCCAGTGTGGGGAGAGGGACTTTCGTTTCCTGCCCCTCACCACATCAG 
AC^CAGGCAGGCAAGAATAAGATGGC 

2 5 TGGGACATTACCTGTTACTAGGTGGACTTCACTGCCTGTGAATGGAAGCTGAAGGG 

CTGTTTTTTTGGTTTGTATTTGGACAGGCCAGGCTTANAGAGGGAGAGAACTGGGC 
TACTCTTCAGCAGTGATCTTTAAAATGCC 

Sequence ID 747 

30 

CAGAGTGCAAGACGATGACTTGCAAAATGTCGCAGCTGGAACGCAACATAGAGACC 
ATCATCAACACCTTCCACC^ATACTCTGTGAAGCTGGGGCACCCAGACACCCTGAA 
CCAGGGGGAATTCWIAGAGCTGGTGCGAAAAGATCTGCAAAATTTTCTC^ 
AGAATAAGAATGAAAAGGTCATAGAACACATCATGGAGGACCTGGACACAAATGCA 

3 5 GACAAGCAGCTGAGCTTCGAGGAGTTCATCATGCTGATGGCGAGGCTAACCTGGGC 

CTCCCACGAGAAGATGCACGAGGGTGACGAGGGCCCTGGCCACCACCATAAGCCAG 
GCCTCG<5GGAGGGCACCCCCTAAGACCACAGTGGCCAAGATC^CAGTGGCCACGGC 



WO 2004/046382 



PCT/GB2003/005102 



- 217 - 

CACGGCCACAGTCATGGTGGCCACGGCCACAGCCACTAA.TCAGGAGGCCAGGCCAC 
CCTGCCTGTACCCAACCAGGGCCCCGGGGCCTGTTATGTCAAACTGTCTTGGCTGT 
GGGGCTAGGGGCTGGGGCCAAATAAAGTCTCTTTCCTC 

5 Sequence ID - 757 nt : 583 

GAACCCTGCGGAGGGACTTCAATCACATCAATGTAGAACTCAGCCTTCTTGGAAAG 
AAftAAAAAGAGGCTCCGGGTTGACAAATGGTGGGGTAACAGAAAGGAACTGGCTAC 
CGTTCGGACTATTTGTAGTCATGTACAGAACATGATCAAGGGTGTTACACTGGGCT 
TCCGTTACAAGATGAGGTCTGTGTATGCTCACTTCCCCATCAACGTTGTTATCCAG 

1 0 GAGA^TGGGTCTCTTGTTGAAATCCGAAATTTCTTGGGTGAAAAATACATCCGCAG 
GGTTCGGATGAGACCAGGTGTTGCTTGTTCAGTATCTCAAGCCCAGAAAGATGAAT 
TAATCCTTGAAGGAAATGACATTGAGCTTGTTTCAAATTCAGCGGCTTTGATTCAG 
CAAGCGACAACAGTTAAAAACAAGGATATCAGGAAATTTTTGGATGGTATCTATGT 
CTCTGAAAAAGGAACTGT.TCAGCAGGCTGATGAATAAGATCTAAGAGTTACCTGGC 

15 TACAGAAAGAAGATGCCAGATGACACTTAAGACCTACTTGTGATATTTAAATGATG 
CAATAAAAGACCTATTGATTTGG 

Sequence ID - 758 nt : 424 

CTTGGCTCCTGTGGAGGCCTGCTGGGAACGGGACTTCTAAAAGGAACTATGTCTGG 
2 0 AAGGCTGTGGTCCAAGGCCATTTTTGCTGGCTATAAGCGGGGTCTCCGGAACCAAA 
GGGAGCACACAGCTCTTCTTAAAATTGAAGGTGTTTACGCCCGAGATGAAACAGAA 
TTCTATTTGGGCAAGAGATGCGCTTATGTATATAAAGCAAAGAACAACACAGTCAC 
TCCTGGCGGCAAACCAAACAAAACCAGAGTCATCTGGGGAAAAGTAACTCGGGCCC 
ATGGAAACAGTGGCATGGTTCGTGCCAAATTCCGAAGCAATCTTCCTGCTAAGGCC 

2 5 ATTGGACACAGAATCCGAGTGATGCTGTACCCCTCAAGGATTTAAACTAACGAAAA 

ATCAATAAATAAATGTGGATTTGTGCTCTTGT 

Sequence ID - 764 nt : 626 

GATTTTTTTTTTTTTTTTGAGATGGAGTCTTTCTCTGTCGCCCAGGCTGGAGTGCA 

3 0 GTGGTGAAATCTCGACTCACTGCAACCTCCGTCTCCTGGGTTCAAGCAATTCTCCT 

GCCTCAGCCTCCTGAGTAGCTGGGATTACAGGCACCAGCCACCACGCCCGGCTAAT 
TTTTGTATTTTTAGTAGAGACAGGTTTTCACCATGTTGGCTAGGCTGATTTTGAAC 
TCATGACCCCAAGTGATCTGCCCGCCTCGGCCTCCCAAAGTGCTGGAATTACAGGT 
GTGAGCTACCACTCCCAGCCAATGATTACATTTATAAGGTAAAATAACTTGTGCCA 
35 ATCTGTACAAGTGAATTCAGATTTAAAATTTTAATTGTAAAAAGATATCCAGGTGA 
TATTTCTCCCTGAATAATTTAGTTTCCTTTTCTATTTCTTGATATAAAAGTACTCA 
GCATTGAAGTAATTGCTATCTTCACATTTCTTCCTATTTGAGCTGTCTAAATAAGT 
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AGTCCTACATATTTTCCCCCCAACACAT^AAAACCCAGAAAAGAATTATTTTATACT 
GGATTTTTTTGGTTGTAGCAGGAACCTAAAGGNGCCAATTGTAACATGCATGTTCT 
TTTTGGCAAA 

5 Sequence ID 766 

GTCCATCCTGCAGGCCACAAGCTCTGGATGAGGAACTTGAGGCAAGTCACCAGCCC 
CTGATGATTTCGCCTAAAAGAGCAAGGACTAGAGTTCCTGACCTCCAGGCCAGTCC 
CTGATCCCTGACCTAATGTTATCGCGGAATGATGATATATGTATCTACGGGGGCCT 
GGGGCTGGGCGGGCTCCTGCTTCTGGCAGTGGTCCTTCTGTCCGCCTGCCTGTGTT 

1 0 GGCTGCATCGAAGAGTAAAGAGGCTGGAGAGGAGCTGGGCCCAGGGCTCCTCAGAG 
CAGGAACTCCACTATGCATGTCTGGAGAGGCTGCCAGTGCCCAGCAGTGAGGGACC- 
TGACCTCAG^GGCTVGAGACAAGAGA^CACCAAGGAGGATCeAAGAGCTGACTATG • 
CCTGCATTGCTGAGAAGAAACCCACCTGAGCACCCCAGAC^CCTTCCTCAACCCAG * 
GCGGGTGGACAGGGTCCCCCTGTGGTCCAfeCCAGTAAAAACCATGGTCCCCCCACT 

1 5 TCTGTGTCTCAGTCCTCTCAGTCATCTCGAGCCTCCGTTCAAAATGATCATCATCA 
AAACTTATGTGGCTTTTTGACCTTTGAATAGGGAATTTTTTAAAATTTTTTAAAAA 
TT 

Sequence ID 768 

2 0 CCAGCGCAGGGGCTTCTGCTGAGGGGGCAGGCGGAGCTTGAGGAAACCGCAGATAA 
GTTTTTTTCTCTTTGAAAGATAGAGATTAATACAACTACTTAAAAAATATAGTCAA 
TAGGTTACTAAGATATTGCTTAGCGTTAAGTTTTTAACGTAATTTTAATAGCTTAA 
GATTTTAAGAGAAAATATGAAGACTTAGAAGAGTAGCATGAGGAAGGAAAAGATAA 
AAGGTTTCTAAAACATGACGGAGGTTGAGATGAAGCTTCTTCATGGAGTAAAAAAT 

2 5 GTATTTAAAAGAAAATTGAGAGAAAGGACTACAGAGCCCCGAATTAATACCAATAG 

AAGGGCAATGCTTTTAGATTAAAATGAAGGTGACTTAAACAGCTTAAAGTTTAGTT 
TAAAAGTTGTAGGTGATTAAAATAATTTGAAGGCGATCTTTTAAAAAGAGATTAAA 
CCGAAGGTGATTAAAAGACCTTGAAATCCATGACGCAGGGAGAATTGCGTCATTTA 
AAGCCTAGTTAACGCATTTACTAAACGCAGACGAAAATGGAAAGATTAATTGGGAG 

3 0 TGGTAGGATGAAACAATTTGGAGAAGATAGAAGTTT 

Sequence ID 773 

GAGGAAAGGGGAGTTAATATTTAGTGGACAGAATTTCAGTTTTACAGATGAAAAGA 
GTTCTGGAGATAGACGGTGTTGATAGTTGCACAGCAGTGTGAATGTGCTCATTGTT 
3 5 ACCGAACTTAAAAATGTTTAACATAGTATTATGTGATTTTTATTTTGCCACTTAAA 
AAAAAAGAATGAAGTACTGATACATGCTACAACATGGGTGAGCTTTAAATACATTC 
' TGCTCAGTGAAATAAGCa\GATGCAAAAGATCACATATTATATAATCCACTTATAC 
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GAGATACCTAGAATAGGCAAATTCATAGAGACAGAl^AGTAGAATAGTGGTTCCCAG 
GGGCTGGGGACAAGGGGGCAGTGAGAGATTGAGAGTTATTATTAATGCGTACAGAG 
TTTCAGTTTGGGCTGATAAAAAAGTTCTGAAGATGGATGGTGATGATGGTTGTACA 
TCAATGTGAGTGTAATTACCGCCACTGAACTGCCCTTAAAAACGTTTAAAAGAGTA 
5 AATTTTATGTTGNGTATATTTTACCATAAT 

Sequence ID 776 

TTTTTTTTTCATAAGAGGCAAGTACAAGAAAAAGCTTAATTACTTTAACTTCTAAG 
TAGTTTGGAATCTAAATAAATAGGAGTTACCAAATATATGCGCTTCTGTGAATAGT 
TTTCCCCCACATGTTTATTTATATTTTTGCATCTCATCAAACCTAACAGATTCTAA 
AGTCTCTGGTGATAATGACAATATCTGCTACGGAGAGACTAGCCTGGGGGAAGAGG 
ATCTCCCTGAACAAGGATAGCGGAGTTGCTGCAGCTTTCAAATGAAGCTGGACATT 
TAGCTGCGGGGGTAGCACCCTTTGATCAAGGCAGCCCAAAGATGAGTTTCAGGGAT 
GGGACTGACAGAAGAGAAAAGTTCTTCCCAGCCCTTTCTACTTTTTCTCTTTGTTT 
CTCAGGCTTCTGGCCGTCTTCAGTTTTCACAAGTTTCACTCTCAACCCTAAACAGT 
ACTTCTGTGAAGTACCCTTTGGCCCCTCGTTTTCAGCTCCTAAACTCACCTGGAAA 
TAGATGTCAATCTAATTTTGGGTCTGACTAGTGCAGTAGGCATTTTTGGTGA 

Sequence ID 782 

CTCACACAGAACAAAAATGAATGAGTGTGGCTGTGTGCCACTATCACTGTGTCTAC 
AAAAACAGCCAGTGGGCCTGATTTGGCCCTTGGCTGCAGTGCGCCCGTCTCTGTTT 
TTGAGGAATAAAATCGCATCATTTCATATGGCTAATGCAATTTTTTTCCCATCTGG 
AAGCAACATCTGATTGGACTCATCTTGTATGGTGCTTGTTACAGTCTCTGTAAATG 
GGAGAGGGTCCGAGAATAGCTCTTCCTGTTTTCATCAGGACTGTTTTTAGGGATGG 
CAAAGAAGTCAGTGTGTCCAGCCTGTGTCCTCCTCACCACGTGGCTGATTCCTGAA 
TCTGCATGTGCANCACNTGCCGTTGTCTGGGGCATGATCTGTGTGA 

Sequence ID - 785 nt : 556 

CTTTTCTCTGGGTATAGATTTACCCTAGCACCTATCTCATTATATTGAATTTTCCA 
3 0 GCATATTTAAATAAACTATTAATTAGTCACACTATTTCTTAAAAGTCACACTATCA 
ACTAATCGTGACCGCAATTATCTAGGGGTGATAATGTGCTGAGTCTACTCTTTAAA 
TACACTGGGACCCAGCATATTGAGTTATATTGGCACAGAAACTTCACTCTGGGTAT 
AGATTTACCCTAGTACCTTGCCGGCAGGATCCTATTATTCATGGTTGTACAAGCAA 
GGTTCAGGGAAGAGGCTGGCACAGAGAAGGTACCTGGTAACTGTTGTTTGAGGCTG 
3 5 AATTCAGCTCAACTCAGCTCCAGTAGAGATGGTGTCCCCTTCTCTACCGTGTTGAG 
ATAGTGTGCAGTCCCTTCCTAAGGGCTGTTACCCACCGCAATAGGACTTGTCAGCT 
TCAACTTTTAAATTTCTCTGCTCCCGCTGGGACCCACCCGCTTCAAAAATCATCAT 



15 



20 



25 
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GGNGGNTTTAGCA.CCAATTTAGTAAACACAAACTGTCTGAAATATTTTGGAT 
Sequence ID 796 

GAACATTCAAGATAGTGAGAGGAAGAAAAAGATATGGCTGTACGGGACCGAGGTCT 
CTTCTATTATCGCCTCCTCTTAGTTGGCATTGATGAAGTTAAGCGGATTCTGTGTA 
GCCCTAAATCTGACCCTACTCTTGGACTTTTGGAGGATCCGGCAGAAAGACCTGTG 
AATAGCTGGGCCTCAGACTTCAACACACTGGTGCCAGTGTATGGCT^AAGCCCACTG 
GGCAACTATCTCTAAATGCCAGGGGGCAGAGCGTTGTGACCCAGAGCTTCCTAAAA 
CTTCATCCTTTGCCGCATCAGGACCCTTGATTCCTGAAGAGAACAAGGAGAGGGTA 
CAAGAACTCCCTGATTCTGGAGCCCTCATGCTAGTCCCCAATCGCGAGCTTACTGC 
TGATTATTTTGAGA2\AACTTGGCTTAGCCTTAAAGTTGCTCATGAGCAAGTGTTGC 
CTTGGCGGGGAGAATTCGATCCTGACACCCTCCAGATGGCTCTTCAAGTAGTGAAC 
ATC(^GACC^TCGC^TGAGTAGGGCTGGGTCTCGGCCA.TGGAAAGCATACCTC^G 
TGCTCANGATGATACTGGCTGTCTGTTCTTAACAGAACTGCTATTGGAGCCTGGAA 
ACTCAGAATGCAGATCTTTTGTGAACAAAATGAAGCAAGAACCGGAGACNCTGAAT 
AGTTTTATTTCTGTATTAAAAACTGNGATTGGAACAATTGAAGA 

Sequence ID 801 

CCACTCCACCTTACTACCAGAC^CCTTAGCCAAACCATTTACCCAAATAAAGTAT 
AGGCGATAGAAATTGAAACCTGGCGCAATAGATATAGTACCGCAAGGGAAAGATGA 
AAAATTATAACCAAGCATAATATAGCAAGGACTAACCCCTATACCTTCTGCATAAT 
GAATTAACTAGAAATGAGGATTCTGACCTTGACTTTGATATCAGCA]\ATTGGAACA 
GCAGAGCAAGGTGCAAAACACAGGACATGGAAAACCAAGAGAAAAGTCCATAATAG 
ACGAGAAATTCTTCCAACTCTCTGAAATGGAGGCTTATTTAGAAAACAGAGAAAAA 
GAAGAGGAACGAAAAGATGATAATGATGATGAGTCAGGTAAAAGTTCCAGAAATGT 
GAACAACAAAGATTTTTTTGATCCAGTTGAAAGTGATGAAGACATAGCAAGTGATC 
ATGATGATGAGCTGGGTTCAAACAAGATGATGAAATTGCTGAAGAAGAAGCAGAAG 
AAGGAAGCATTTCTGAAATATGAATGAAAAAAATTACATCTTTAGAAAAAGAGTTA 
TTAGAAAAAAGCCTTGGCAGCCGTCNGGGGGAAGTGACGCACAGAAGAGACCAGAG 
0 AATAGCTTCCTGGANGAGACCCTGCACTTTACCCATGCTGCTGGATGG 

Sequence ID - 808 nt : 641 

CCGGGTTTTAGTATTTAACCAAGAGCCTTTTAAATATTGAAAACCCATAGTTCAGA 
AAATGTTAGTATTGCTGCCCTTCTTCACATAAATTTTTTTTTAAATTATACTATTA 
TTTTGCTTAATTTTATATTGGGTTAAAACAACCTTCAAGAAGGTTAACTAGGAAAG 
AAGACCTTTTTGTTTTATTTTTACTATTTATATATAGAAGACAAATCAGCATTTGG 
TGATAGTTTTACATGACCAGTTATCAAACGGTC^TAGTATGAAGTGTGCAGTTGTT 
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CATTATTAGTAAATTATGTTTGATTTTTAAACTATTTAGTACTAATAGTTGAGATG 
AAAACTGAAGAAAAATGCCAATGTGACGTTTGTGTATAGCTAGCCTTAAAAAACTT 
CCCATGTTTTTAGGTGACTTTTTTCCCCCTCTTAGTACTCTGGAGAAACAATGAAG 
ATGGGCGATCTCAATTCCAGATGTAAA 
5 TGTAACTGCTATTATTGNGGATTCTTGNCTTGNGTATTTTCTTTCCCTTATTCAAG 
TAATATAGAATAACTTTCCTTAAAATGATTTGATCCAAGATACGTCATTTCTGTAT 
TGGCAAAATGCCNCTATTAAAGTGT 

Sequence ID - 814 nt: 132 

1 0 GTTAAAGTGATACATTTTTATACCAAAT^ 

- TAAAATTGCAATTGTATTAGGTGTTAAAATAAAGTTTTTAAAAAATTAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAA 

Sequence ID 817 

15 GACAACCTTAGCCAAACCATTTACCCAAATAAAGTATAGGCGATAGAAATTGAAAC 
CTGGCGCAATAGATATAGTACCGTAAGGGAAAGATGAAAAATTATAACCAAGCATA 
ATATAGCAAGGACTAACCCCTATACCTTCTGCATAATGAATTAACTAGAAATAACT 
TTGC^GGAGAGCGAAAGCTAAGACCCCCGAAACCAGACGAGCTACCTAAGAACAG 
CTAAAAGAGCACACCCGTCTATGTAGCAAAATAGTGGGAAGATTTATAGGTAGAGG 

2 0 CGACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCA 
ACTTTAAATTTGCCCACAGAACCCTCTAAATCCCCTTGTAAATTTAACTGTTAGTC 
CAAAGAGGAACAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAAAA 
TTTAACACCC^TAGTAGGCCTAAAAGCAGCCACC^ 
AACACCCACTACCTAAAAAAATCCCAAACA^ 

2 5 GGACCAATCTATGACCCTATAGAAGACTAATGTTAGTATAAGTAACATGAAAACAT 

TCTTCTNCGCATAAGCCTGCGTCAGATTAAAACACTGAACTGACAATTAA 

Sequence ID - 821 nt : 370 

AAAGAGCTCCCAAATGCTATATCTATTCAGGGGCTCTCAAGAACAATGGAATATCA 

3 0 TCCTGATTTANAAAATTTGGATGAAGATGGATATACTCAATTACACTTCGACTCTC 

AAAGCAATACCAGGATAGCTGTTGTTTCANAGAAAGGATCGTGTGCTGCATCTCCT 
CCTTGGCGCCTCATTGCTGTAATTTTGGGAATCCTATGCTTGGTAATACTGGTGAT 
AGCTGTGGTCCTGGGTACCATGGCTGGTTTCAAAGCTGTGGAATTCAAAGGATAAA 
TTAATGAAGAAAACAAGCGGAGCTGAAGAAGAAAGTACAATATGGTGCTGTCTTCC 
3 5 TAATGAAATAAATTCACTAAATGGACATTAAAAA 

Sequence ID 825 
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AGACTCGAGCAAGCTTATGCATGCATGCGGCCGCAATTCGAGCTCGGCCACTTGGC 
CAATTCGCCCTATAGTGAGTCGTATTACAATTCACTGGCCGTCGTTTTACAACGTC 
GTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCT 
TTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAACAGTT 
5 GCGCAGCCTGAATGGCGAATGGAAATTGTAAGCGTTAATATTTTGTTAAAATTCGC 
GTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAA 
TCCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGG 
AACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGT 
CTATCAGGGCGATGGCCCACTACGTGAACCATCACCCTAATCAAGTTTTTTGGGGT 
1 0 CGAGGTGCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGCCCCCGATTTAAAGCT 
TGACGGGGAAAGCCGGCGAACGTGGCGAGAAAGGAAGGGAAAAAAGCCAAANGGAG 
CCGGCGCTAGGGCCTGGCAAGTGTAGGGGCACGCTGCGCGTAACCACCCACACCCC 
GCCGNGCTTAATGCCCCNTTCAGGGCGCGTNCTGATGCCGNATTTTNTCTTACNCA 
TNTGTGCNGGNTT 

15 

Sequence ID 833 

TAAAATAATGGCAAAAAACAAACAAAAAACAAGTTCTCTAAACAGAAAGGAAATTA 
CrAAAGAAGGAATCTTGAAATAACAGGAAAGAGGAAATACCACAGTAGGCAACATT 
ATGGGTAAATAAAACAGACTTTCCTTCTTTAGTTTCCTAAAATATGTTTGATGATT 
2 0 AATGCAAAAATTACAATATTTTCTTATGTAGCACTAAAGGTATGTAGAGAAAATAT 
TTAAGATAATTGTACTGTAAGCGGGAGATGACAGTGACATAAAGGCAACGTTTTTA 
TACTTCACTCAAACTTTATGTATTAATGTAATCCATAAAGCAACCAAAAAAGCTAT 
ACTAAGTACATTCAAAAACACAATAGATiyVACCAAACAAAATTCTAAAGGATGTAC 
AAGTAACCCACTGGAAGCTGCAAAAAATGTAAACAGAAACTAAAAACAGAGAATAA 

2 5 ATGAAAAATTAAAAACGAAATGGCAGACTTAGGCCCTAATATACAAATTATCACAT 

TAAATATAAATGGTCTAAATACACCAACTGTAAGACAGAGATTAGCAAAGTCGATT 
TAAAAACATGACTCAACTACGTGCTGTCTACAAGAAACTCACTTCAAATATACCAA 
GATAGGAAGGTTGAAAGTAAAACGATGGAAAAAGATGTATCATGTGAACATTAATC 
AAAGGAAAGCAGGGGTGGCTATATTAACATCAGGTAAAATAAACTTT 

30 

Sequence ID - 837 nt : 603 

TGAGGNTGGTCATGATGCANAAGCTACTCAAATGCAGTCGGCTTGTCCTGGCTCTT 
GCCCTCATCCTGGTTCTGGAATCCTCAGTTCAAGGTTATCCTACGCGGAGAGCCAG 
GTACCAATGGGTGCGCTGCAATCCAGACAGTAATTCTGCAAACTGCCTTGAAGAAA 

3 5 AAGGACCAATGTTCGAACTACTTCCAGGTGAATCCAACAAGATCCCCCGTCTGAGG 

ACTGACCTTTTTCCAAAGACGAGAATCCAGGACTTGAATCGTATCTTCCCACTTTC 
TGAGGACTACTCTGGATCAGGCTTCGGCTCCGGCTCCGGCTCTGGATCAGGATCTG 
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GGAGTGGCTTCCTAACGGAAATGGAACAGGATTACCAACTAGTAGACGAAAGTGAT 
GCTTTCCATGACAACCTTAGGTCTCTTGACAGGAATCTGCCCTCAGACAGCCAGGA 
CTTGGGTCAACATGGATTAGAAGAGGATTTTATGTTATAAAAGAGGATTTTCCCAC 
CTTGACACCAGGCAATGTAGTTAGCATATTTTATGTACCATGGNTATATGATTAAT 
5 CTTGGGACAAAGAATTTTATAGAAATTTTTAAACATCTGAAAA 

Sequence ID - 83 9 nt : 71 

ATTTATCTAATATTTGGTTTAATAAAATGTGAATAATGAAAAAAA2\AA2y\AAAAAA 

AAAAAAAAAAAAAAA 

Sqeuence 849 nt: 622 

TGGTGCGATCTCAGCTCACTGCCACCTCACCTCCTAGGTTCCAGAGATTCTTGTGC 
TTCAGCCTCCTCAGTAGTTGAGAATACAGGAAC^ 
1 5 TTGTATTTTTAGTAGAGATGGGGTTTCACC^TGTTGGCCAGGCTGGTCT 

CTGGCCTAAGTGACCCACCTGCCTCAGCCTCCCAAAGTGCTGGGATTATAGGCGTG 
AGTCATTGTCCCCAGCCGGATGTTTTCATCTTGATTTGCCTTAGTTTCTAAATCTC 
ATCCTCTCCATTTTCTCCTGTTAGTAGTCACAGAGAACCAAATTCTGTCAAGTTAT 
GAAACTAAAGTCTCTCTTCCACAAGTCTTCCTGTGTTCTGCCTCAAGTGAACTTGA 

2 0 AAGAACATCAGTTTGTGGGAAGGTTGAAGACCGAATGATCTGCTGGGAAATCACTG 

AGGCATTGCCATTCTCTTGAGGAATTTCATTTTCATCGAAGTTTCGGTTTATATCC 
CTTTCTTGGTGAGTACTATTGCTGTTATGTAAATTAAATGAGTCGTCATCCTTCTT 
NTGAGC 

25 Sequence ID - 860 nt : 501 

GTGAAATCACTTTCATGGATTATTAATC 

TCAAGATTTCATAATCATTTTTAGTATTTAGATTGTGCCTCAAAGTTGTAGTACCT 
CACAATACCTCCACTGGTTTCCTGTTGTAAAAACCTTCAGTGAGTTTGACCATTGT 
GCTCTTGGCTCTTGGGCTGGAGTACCGTGGTGAGGGAGTAAACACTAGAAGTCTTT 

3 0 AGTACAAAACTGCTCTAGGGACACCTGGTGATTCCTACACAAGTGATGTTTATATT 

TCTCATAAAGAGTCTTCCCTATCCCAAGGTCTTCATGATGCCAGTAGCCATATATG 
ATAAATTATGTTCAGTGATAACTTAGTTATCAGAAATCAGCTCAGTGGTCTTCCCC 
GCCATGATTCACATTTGATGAGTTTTTAAAAATCAAAGTGATTTTGAAAATCTCTA 
ATGGCTCAGAAAATAAAAACATCCAGTTTGTGGATGACTATATTTAGATTTCT 

35 

Sequence ID 864 

TTGTGTTTTTAGGACTCCTTATCTAAATTAAGGCAGAGAAGTTACAGTATTTATAT 
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CTGCATTAAATCTCAATTCCAGAAAAACCTTTTGAAAAATTATTTAATCCTCTGGA 
AACTATTGATATGATACAGGAGAAATTTTCAGAAGTTTATTGAATAATTTAATATC 
ATTTAATAGGACACTCTGGCTTGTATATAAGCAGATACGTTACTCAGACTTCTTGG 
CTGTACTCTAAAATAATATATGTACTAGTCTCCTAAATATTACTAGCTCACCTTTC 
5 AAAATGCATACTAATATTTCAATGTCTTTCTTCAATTTGAAAAGCTCTTGAATATC 
TACTTGTGATAGCCCTAAGAGCTGAGATAATTATTTCCAGGAGGTTGAATCCCTGA 
TTCTTAACTGTTCAGCAATGCATAAGCAAGAGAGAATATGAC^TAAGAGGACCATT 
TCTACATTAGCCATTTTTTTTCACAAGATACCTATGTGAATACAGGGCACCTGGGA 
GGGTAAGTGGAGGACTATTTCTAACTATATTTATAAGCACATACTGATATTGGTGA 
10 ATC^y^AACCTACAGCAGTGCTTCTCAGATGGGAAGGGAGAGAATGTGTAAGGAGA 
CAGGAATTCATTAG 

Sequence ID - 865 nt: 122 

CC^AATCCACTCTCCAGTCTCGCTCCCCTGACTCCCTCTGCTGTCCTCCCCTCTC 
1 5 ACGAGAATAAAGTGTC^GCAAGA2^AAA^^ 
AAAAAAAAAA 

Sequence ID 867 

TTTTTTTTTTTTTTTTTTTCAGAGTCACAGATATTGTATAGCTGAGGTAAGCATTT 
2 0 TACAACTTTTCAGAGACAAGTAAGTACATAAAT 

TAATATTTCCACATTGAANAATAGATGTGATAATTAAATCTTTTATAAGGTTTTAA 
AAAGACATGAAAC^TAAACCTAATTATACATAAAAGAAAAGAATTTTAAACAAGAG 
CTTATTGNGATGACATTACTCATAACTTTTACCTTTAAAACCTTTTCTTGGGTAGC 
TATTCAAAAGTAAAGAC CACAAGTTTTGTTGC C CANATTTCTTATGTTTNGTATAT 

2 5 TTAAGCTCTTTATTTATTGAACAGATGNGTCATTAATTCATTNGGAGCATTACTAT 

TATCAGTAAAATTTGATTTTTTTTTCCCCTCAGTCATAGGTAAATCAGCTCCACCT 
GGAATTTCTAAGGACCCAGTTTTAGTCAATATTTTCAAGTAATCATGACCTCAGAA 
ATAGTCTTAATTAAGATAACAAATATTAGC CAT CAAAATGGAACCAAGACAAGATT 
CTAATGTTTGTAAACAGTCAATCCATATTTATGAATATTAGCATATATTGGNGAAT 

3 0 AGTTAAGGCAAAAGGGTCTAGCAG 

Sequence ID - 869 nt : 667 

TTGTGTTTTTAGGACTCCTTATCTAAATTAAGGCAGAGAAGTTACAGTATTTATAT 
CTGCATTAAATCTCAATTCCAGAAAAACCTTTTGAAAAATTATTTAATCCTCTGGA 
3 5 AACTATTGATATGATACAGGAGAAATTTTCAGAAGTTTATTGAATAATTTAATATC 
ATTTAATAGGACACTCTGGCTTGTATATAAGCAGATACGTTACTCAGACTTCTTGG 
' CTGTACTCTAAAATAATATATGTACTAGTCTCCTAAATATTACTAGCTCACCTTTC 
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AAAATGCATACTAATATTTCAATGTCTTTCTTCAATTTGAA2\AGCTCTTGAATATC 

TACTTGTGATAGCCCTAAGAGCTGAGATAATTATTTCCAGGAGGTTGAATCCCTGA 

TTCTTAACTGTTCAGCAATGCATAAGCAAGAGAGAATATGACATAAGAGGACCATT 
* * 

TCTACATTAGCCATTTTTTTTCACAAGATACCTATGTGAATACAGGGCACCTGGGA 
5 NGGTAAGTGGAGGACTATTTCTAACTATATTTATAAGCACATACTGATATTGNTGA 
ATC1AAAACCTACAGCAGTGCTTCTCAGATGGGAAGGGAGACAATGTGTAAGGAGAT 
CAGGAATTCATTAGTC^CCTTTCAGATGGTTTAATGCATACAGCTGTACCG 

Sequence ID 870 

1 0 GGAGTTTGAGCAGATCCTTCAGGAGCGGAATGAACT 

TCAAGGAGGAACTGGCCTACTTCCAGCGGGAGCTGCTCACAGACCACCGGGTCCGC 
GGCCTTCTGCTCGAGGCCATGAAGGTGGCTGTGCGGAAGGAGCGGAAGAAGATCAA 
GGCGAAGATGTTAGGGACACCAGAGGAAGCAGAGAGCAGTGAGGATGAGGCTGGCC 
CATGGATCCTGCTCTCCGATGACAAGGGAGACCATCCCCCACCCCCGGAGTCCAAA 

15 ATACAGAGTTTCTTTGGCCTATGGTATCGGGGTAAAGCTGAATCCTCTGAGGATGA 
GACCAGCAGCCCTGCACCC^GCAAGCTAGGGGGAGAAGAGGAGGCCC^CCACAGT 
CTCCAGCTCCTGATCCGCCCTGTTCTGCCCTCCACGAACACCTTTGTCTGGGGGCC 
TCAGCCGCCCCAGAGGCCTGACTTAGGGGTCTGGCTGTGGAAGGATGTGTGGCCTC 
AAATGAGGACAGGGCTCCCGCCTTCACAGCCCTCGCCAGGGGTCTGCCCCAATCCT 

2 0 GGCCTGCATCAGGCAAGGACGGGGTCTCAGC 

Sequence ID - 871 nt : 642 

GCAAGTCTTCAGTATGTACATTTATCCCCTAGAAGAAGAAAAATTAGTTGTGCATG 

AAAAAGAAACATTAACTGCAAAGCTAAATGCTCACACTCTAAATCAGTGCTCTCCA 

2 5 AAGTACAGCAGGCGGGAAAAGAA7VATGGTAGATTTTTTTCTTCCAATTACTTTAAC 

TTATTCTTTTTAATGGACACTTCATACATAAATATATTCACAATATATTAATATAT 
ACATAATGTATAAGCATACATATTGAATGTGCAGTCAAAAAATGTACTAATGGAAT 
GCTCTACCAAAAGAAGTTCACGTTCATCTGTAAAATGGGAATAATATTTTTAAAAG 
GCATACAGTCTGAACATTTTTAGATTATTCATAAAATCTATTCAGAAAGTTAAACT 

3 0 AAAAAATTTAACGTATGCCTATAACAAATTTTGTACTTAATGTAATTGNTTTTCAT 

CCTGAGATCTAATATCCTCGTTTTTAAGTAGAGCCACTTGTTTGCTACAGTTTAGT 
CAAAACGTTAACATTAGATGGGTAAAGTAATATGAAATCTTTCTACTACTCCAAAA 
TAGAAAACAGAACATTAAAAAGATAAAAATTCAAACATACTTACCAGTAGATTTTC 
AACTGNGCAAAAGCTCATTGCATGGG 

35 

Sequence ID 873 ' 
GTTTTCCACCGTGAAGAGAACATTTCCTC 
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CTTTTATTTCTATTGGAAGATGCCCATCATACTTCTGGCAGGATA2\AATGATAAAT 
TTATTTATTCAACAGATGATACTCAATTCCCTGCTGTTTTACTAAAGGTTCTTTAC 
GTTTTATAGAAGCTAAATTTACTGTCATAGAAATTGCAATTGTAGATGTTACTGTA 
ATCTAGTCAGAATATCCTTATCCTTCTAAAATAAAACTAGTTAAAATTATTAACAT 
5 ACGTACTGATATTAATTTTTAAGTTTAATGCTGCCACGTGCTTCTGCTAAGAACAT 
TTATCACTACAAGTGGCAGAAAATTCCAAACTCATCAAAACCAAACTGTTGCTTCT 
TCCCTGCTTTTTCAGAAAATGAGAAAGGATGACTTTATTCCAACATATTCTAAAAG 
TATTCCAAGAACACTACCTTTATTCTAAATTCGTTATTTTCACAAAATAAAGGCTG 
CAGATTGAAAGATAAAGGATTGCTATTAAAGAACAAAAGAAAACAAAACCGAGAGA 
1 0 GAAGGAGAGCTAGGGAAATCCCTGCANAANAACCGAATANGGTCCCTCTATTCTGG 
GCCGGGGCCTGAAACTATGAAACAGGCCAACACAGAATCTTGGCA 

Sequence ID 875 

CCTCTGACTCGGTCAGCTCACCCACGCTGCTGGCCCTGTGAGGGGGCAGGGAAGGG 
15 GAGGCAGCCGGCACCCAGAAGTGCCACTGCCCGAGCTGGTGCATTACAGAGAG(3AG 
AAACACATCTTCCCTAGAGGGTTCCTGTANACCTAGGGAGGACCTTATCTGTGCGT 
GAAAC^CACCAGGCTGTGGGCCTC^GGACTTGAAAGCATCCATGTGTGGACTCAA 
GTCCTTACCTCTTCCGGAGATGTAGCAAAACGCATGGAGTGTGTATTGTTCCCAGT 
GACACTTCANAGAGCTGGTAGTTAGTAGCATGTTGAGCCAGGCCTGGGTCTGTGTC 
2 0 TCTTTTCTCTTTCTCCTTAGTCTTCTCATAGCATTAACTAATCTATTGGGTTCATT 
ATTGGAATTAACCTGGTGCTGGATATTTTCAAATTGTATCTAGTGCAGCTGATTTT 
AACAATAACTACTGTGTTCCTGGCAATAGTGTGTTCTGATTAGAAATGACCAATAT 
TATACTAAGAAAAGATACGACTTTATTTTCTGGTAGATAGAAATAAATAGCTATAT 
CCATGTACTGNAGTTTTTCTTCAACATCAATGGTCATTGNAATGTTACTGATCATG 
2 5 CATTGGTGAGGNGGTCTGAATGTTCTGACATTAACAATTTTCCAT 

Sequence ID - 876 nfc: 115 

AAACTTTTGTGGCAACAGTGCACTAATTTGGATAATGTTTGTTCCCAATAAATTAA 
GAGCCAAATTGTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
30 AAA 



Sequence ID - 878 nt : 634 

GCCAGGCTTTGTGAATTACAGGACATTTGAGACAATCGTGAAACAGCAAATCAAGG 
CACTGGAAGAGCCGGCTGTGGATATGCTACACACCGTGACGGATATGGTCCGGCTT 
3 5 GCTTTCACAGATGTTTCGATAAAAAATTTTGAAGAGTTTTTTAACCTCCACAGAAC 
CGCCAAGTCCAAAATTGAAGACATTAGAGCAGAACAAGAGAGAGAAGGTGAGAAGC 
TGATCCGCCTCCACTTCCAGATGGAACAGATTGTCTACTGCCAGGACCAGGTATAC 
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AGGGGTGCATTGCAGAAGGTCAGAGAGAAGGAGCTGGAAGAAGAAAAGAAGAAGAA 
ATCCTGGGATTTTGGGGCTTTCCAATCCAGCTCGGCAACAGACTCTTCCATGGAGG 
AGATCTTTCAGCACCTGATGGCCTATCACCAGGAGGCCAGCAAGCGCATCTCCAGC 
CACATCCCTTTGATCATCCAGTTCTTCATGCTCCAGACGTACGGCCAGCAGCTTCA 
5 AAAGGCCATGCTGC^GCTCCTGCAGGGACAAGGA(^CCTAC^GCTGGCTCCTGAAG 
GAGCGGAGCGACACCAGCGACAAGCGGAAGTTNCTGAAGGAGCGGCTTGCACGGCT 
GACGCAGGCTCGGCGCCG 

Sequence ID 879 

1 0 GTTGCCGGGTCCTGTGATAACTCTGTTTAACATTTTGAGGAACTGTTGAAT 

TTCACAGCAGCTGCCTCATTTTTTATTCCGATCAGCAGTACTTCTTGGTTCTAATA 
CCTCCACGTTCTGGCCAACACTTGTTGTTGTCTGTAATTTCGTTGTTAGCCATCCC 
AGTGGGGATGAAGTAGTATCTTACTGTGGTTTTCAGTTGCGTTTCCCTGATAATTA 
ATGATGGTGAAC^TCTTTTCATGTTCTTGTTGGCC^TTTGTATGTCTTCTTGGG^ 

1 5 AAAAAAAATGTCTGTTCAAATCCTTTACAAAGTATTTATTTTTTATGTCAACAATA 
TAACCACTCAGTACACTGCTTTTTJ^ACAATGATCTTTTAAAGGTTTGTTTACIAA.C 
ATTTAGCACTTGAAATTTTAAGGTTATGCCCTCAAAAAAATTGCTGAGGGAGCTAA 
GCTATGAAGATGCAAAGGCATAANAATTATACAATGGACTTTGGGGGAATCCAGGG 
AAAGGGTGGGAGGGGGGTGANGGA 

20 

Sequence ID 881 

TCGACTCTGATTTTTTTTTCTCCTTCCTCGCAGCCGCGCCAGGGAGCTCGCGGNGC 
GCGGCCCCTGTCCTCCGGCCCGAGATGAA.TCCTGCGGCAGAAGCCGAGTTCAACAT 
CCTCCTGGCCACCGACTCCTACAAGGTTACTCACTATAAACAATATCCACCCAACA 

2 5 CAAGCAAAGTTTATTCCTACTTTGAATGCCGTGAAAAGAAGACAGAAAACTCCAAA 

TTAAGGAAGGTGAAATATGAGGAAACAGTATTTTATGGGTTGCAGTACATTCTTAA 
TAAGTACTTAAAAGGTAAAGTAGTAACCAA^GAGAAAATCCAGGAAGCCAAAGATG 
TCTACAAAGAACATTTCCAAGATGATGTCTTTAATGAAAAGGGATGGAA.CTACATT 
CTTGAGAAGTATGATGGGCATCTTCCAATANAAATAAAAGCTGTTCCTGAGGGCTT 

3 0 TGTCATTCCCAGAGGAAATGTTCTCTTCACGGTGGAAAACACAGATCCAGAGTGTT 

ACTGGCTTACAAATTGGATTGAGACTATTCTTGTTCAGTCCTGGTATCCAATCACA 
GTGGCCACAAATT 

Sequence ID 883 

3 5 TCATTTACATTAATACTCAAAACTGCTCGATTAA.GCAGGTGCTGTTCTTATCGCCA 
TTTTGCATATGATGAGAAAGGGTAAGGTCACCCAGCTAGTATTTGGCTCACAGCAG 
GCCTTAAGACTTGGTTTGTGTGACTCATCAGTCCACGCTCCTAAAACCACTAAGTT 
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GTTCTACCCTTTAATGTTGAATTAA.CATTGGATAGTGTTCAAGTTTANATGGGTGG 
GTGAGGGCCCAAGGACCTTTCAAACTCAGATCTCTTATTTAATAACCTGGTCCCAG 
ATCCATTCCTCTGTCGAAGAGGAAGTCATCCTTCAGTGGCTATTCATTGTGGGGTT 
AAGAGCGCAGACTATGAATTCAGTCTTTTTGGGTCCCAGTTTGCCAGACCTTGAGT 
5 GAGTGCCCCGAGTTTACTTACTTGTAAAGGTAGGTGGAGGTAATATAATTAAATAA 
ACTTAAAAAACTAATTAAAAACAAAACAAATGAACTAAGGTCTTAGGATATCTGGC 
GTCTATTTTGCGCCAAATCACATAATGTCTATTGTTGTGTGTTGGACTATAGGATT 
GTCCTTTAACAGGGAAGGGTTTATTTCTGTAATCAAGTCTGTCAATATTATGACCA 
TGTTGATAATAGCTACCTTTAATTGAGGGCTTCCATGTGCCAA 

10 

Sequence IiD 885 

TCAGTGGAAAAGGGCAGGTTGAATCAAGGTGAATCA^ 

GCCTGCCATCGCTGTTCCTTCAACTGAGTGCTGCACATCATGGGCTCTGTCTGTGA 
GAGAAAAATCCCGGTGCTTGGTGTCCTTGCATGACATGGAGTTTTGCATGTAGATC 

15 AATTTAAI^TGTACCTCTTGTTTACATAATTTGCATAATTTTAAAAGATAATGTTG 
CCAAACTTTGGAAATGTTAATGTTCANACTGAAAATCTCCACTACATGTAACTTTC 
TTCCTCTGGATCAGTGGCATGGCTTATAATCCCAGCCAGTGGTTTGAACTGTTCCA 
GTGTCAACTGCCATGTGCTCTGCTTCAAGGGGGAACTAGCCTTTTGTGAATTTTTT 
GTACATAAGTATTTGTTACAAATATTTTAGCAAATGCTTTCTATTTCTCTTGCTTG 

2 0 TGCATATCTTGGCTGGCGTTACAGAAAAATAGTGTAAACATTATTTCCTTACCGGG 
GAATGAGGGTTTT 

Sequence ID 887 

AGCACCTGGCACAGAGTAGTAGCTAACACAGATGTTAATTTTGCTGCGTCAAATGT 

2 5 TTTCACTTTGAATCTCTCTTGAGTATTGTTCTCCTTATTGATTACATGATGACATC 

- CTGTTTTCTCTCCCTGACCTTTACTGTTTGTTTAGAAAAAAAAAAAAAAAAAAAAA 
AAAAAA 

Sequence ID 8 89 

3 0 CAGAGAGCTTGTTCCCTCCCTCCCTGTGCATGCAAACAAGAGGGCATGGGAGCACA 

CAGAGAGATGGCAGCCACCTACAAGCCAAGAGGAGAAGCCTCACAATCAAACTCTC 
GCTGCTGGCGAGAGTCTTGGACTCTGTCTTGGACTTCCAGCCTCCAGACTGTGAGA 
AACAAATTTCTGTTGTTTCAGCTTCTCAGTCTCTGGTGTTTTGTTATTGCAGCCTG 
AGAACACAGCTGTACNATTATNAGGGAAACAGAAAACACTGATACTTAACAATGCT 
3 5 AATGCAATTATTTATTTGCTTTTCAGTCTCTACAAAACGTTCTAAAACACTAATCT 
AAATATTAACAGTAAAATATTTGCATAACTAATGGAAACTAAGAAATCATATGACC 
AATATTTCACTTATTGGTAATCTTACTCTACTGATTTCCCCCCAGACTGTGATTTT 
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TGAACTTCCTTGCCTTTCTCCTGTCTTTCTGNGTTTATTCATGGAATTCCAGTTAT 
CTGGGCTTGAAATTGCAGGCTCTCCT 

ATGAGATAAATGTTTCTTTTTTCTTTCTGACTGCATTAAATCAGATACAACTCAGC 
ATTAAAAAGCTATCTTTGNAAAATGNTGGTACTAATAAATTAGTCTTA 

5 

Sequence ID 890 
CCAGTTCCACATTCAGTC 

TTAAATCACAGAAGTTGCAAATGCCACAAATCAAGGTCTTTTTCTCTTGGAG 
TGTTAAACATTTACCAACTCACGACCGCCATGCACCCAATACTGCAATAGGTCTAT 

1 0 AGATGCAGATACTGTCTCCATGAATGTTATAGGCTAGAAAGGAAATAGATAAGTAG 
TCCTACCAGAAGAACA.TGATGAAGGCATTTGTGGTAAACAGAATGATGGCCCCCCA 
- AAGATGTCCACATCCTAATGCCTGAAGCCTATGAATATACTACTTTACTTGGCAAA 
AGGGACTTTGCCACAGGTTTTTAATTAAGGACCTTGAAATAGAGAGATTATCCTGG 
ATAATCCAGATGGCCCCAGTGTAATCCCAAGGGTCCTCACAAAGGGTAGGAAGGAG 

15 AGCCAGAGT(^GAGAAGGAGACGTAGCAATGGAGGCAGAGGTCANAGAGAGATCTG 
CAGATGCTGCTGTGTTGGCTTTGAAAATGAGGAATGCAGGTGACCTCAANGNGCTA 
GATGATGCAAGGAAACAAATAATCTCCTATGAACCCTAGGATGGGCATTATTATGA 
GTCCTATTTTATAAACAAGGAACTGACNTCCAGAAAGATAAATGC 

20 Sequence ID - 891 nt : 626 

GGCAGAGGTTGCAGTGAACTGAGATCATGCCATTGCAATCCAGCCTGGGCAACANG 
AGTGAGACTCCATCTCAAAAAAAAAAAAAAAAAGACAAGAGTNTCCACTCTAAACA 
CTTNTATTCAACATAGTCCTGAAAGTCGTAGCCACAGCAATTTAACAAGATAAAGC 
AATAAAATGTATTCAAATAGAAAAAGAGGAAGT CAAATTATCTTCACTGGNGATAT 

2 5 AATTCTCTACCTGGGAAACTTCACCGAAAAAGATTTCACCAAAAGATTTCTAAGCC 

TAAATAATGACTTCAGCAAAGTCTCACCATAGA?^AATCAACATAC^CAAATGAGTA 
GCATTTCTGTGCACCAATAATATTCAAGCTGAGAAAAAAAGAACATGGTTCTATTT 
ACAATAGCTACAAACAAAAAAATATGTACCTAGTAATACATTAAATCAAGGNGGTA 
AAATATCTNTACAACAAGAACTACAAAACTC 

3 0 TAAGTAAAAAGGCACTCCATGCTCATGAATTTAAAGAATCAATATAATTAAAATGT 

CCGNGCTGCCTAAAGCAACTTACAGATTAAAGGCTATTTCTCTCAAACTATAAATG 
CACCTTTTTA 

Sequence ID - 893 nt : 585 

35 GTCATTGCTGGGTGGCGCCAGCCCTCAGACTTGCCTCTTTGCAGTAGGAAGAAGGC 
CTCCCCACATACCTTCCCACACTCATCACCTTAAGCCAGACTCGGTGTCCAGTGAA 
TATGACCATCTCTTGCCCATTTTCTAATGAGTGTTTTCATTAATGAGTTATAAGAA 
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TGTGGTGGGTAAATCTATGGGCTTTGAACTAGTGAATCAACTTGGTTTCAGAATCT 
GGCACTGCTACTTACTAGTGAATTTAAGCAAGTTATTTCACCTTTCAGAGTGTGAG 
TTCCCTCATGCATACAAGGAAGATAAAAAATAATGTNTACNAAAGTATTGGAGTAA 
TTAATACATGGAGAACTACATGTAAAGCGTTTAGCATGATGTCTGACATATTAAGC 
5 ATCCAATATTAGTNGCTTGCAGAATTATTAGTAAAAGAGATTGCTTCTGAAAGCCA 
TTCCAATTCTTAAATTTTATAATGCCACATTTGAGGTCACCTGAAGTCGTGTATAA 
CATGTGTACATTTTTGCGATTTATTTTTTCAATTCCCANATTAAAGGCATAGAGAT 
. ATCCTAGCNANGGACTCCAAGTGTG 

10 Sequence ID - 895 nt : 560 

GTAATTGCAGCCTGGGCAACGGAGTC 

AAAACTACTGAGGTAGTTGAATATATCCTCCATTCCCCATTTGTGGATTAGTTAGT 
. AAATGGGGCATCTTAGGGTTTAAATATGTCCAGGGTCACTGAGGATCAGATCCTAG 
GGTTCCTTTGACTCAAGGCTTTTGTC 
15 „ CTTTCTGAGGCAAGTAGCAGGGTGGCTACTATGTATCGCTTCTTTATTTTTTCTTT 
TTTAAAATAATGCAGGCACCGTGCGC^ 

AAAAAAAAAAAGCTGTTCTCATCTCCTGTCTTTCTTTTTTTTTTCTTTTTATTTTT 
TTCTTTTATTATTATTATACTTTAAGTTTTAGGGTACATGTGCACAACGTGCAGGT 
TTGTTACATATGTATACATGTGCCATGTNGGTGAGCTGCACCCATTAACTCGTCAT 
2 0 TTAGCATTAGGTATATCTCCTAATGCTATCCCTCCCCCCTCCCCCCTTTTTTTTTT 

Sequence ID 896 

GGGAATGTCTTAGGCACTGGGACTGTAAGTGCAAAGACCCTGTGGCACAAGGGAAT 
GTTAATTATCTACCTTTCANAAACTGGAANAAGGCCTAGCCTAGAGCATTGAAAAC 

2 5 AATAAGGGAAAGGAGGAGTAAGGCTGGANAGATAGGAATGGTTTAAAGTCTTTGTT 

AAAAATTTTTTTAAAAAAATCTTTATCACAAGAAGAGGATTGGCNTGATCAAATTT 
GACTTTTAAAAANATTACTTGGGTTGGGCATGATCAAATACTACTTAGGGAGATTA 
GTTTANATGATAATGGCATTCTGGACCANAGTGGAGTCAGAGGTGAAAAGAGGTAG 
ATATTCCANAATTGAGGGATTTGTGAGGTGAAATCATTTGTTACAGATATTAAAGG 

3 0 ATAAGGAGCTTTGTCAAAGGGGATCTTAAGTTTCTGGTATGGTAACTGGGTTAGAG 

AGCCCTGGAACATGACCAGCTTTAAGGGAAGAGAGCTTGAGCTCTGTTCTTGTTAA 
GCTCAGTTTGAGATCTTTGTGGAATCAAGTGGAGAGGTCTAAGCAGGGAACTGGCT 
TGGCTAGGCTGTAAAGATGAATCTGAGAGTCCCAAGAATATGGTAATTATTAATAA 
AAGCCTTAGGTANATGAAATTGTTTTGGG 

35 

Sequence ID - 897 nt: 509 

GCAAATCTACACATTTGATTAAATGATA 
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AATGCTAGTTTCTTGGTTTTGATATTGTACCATAGTTATGTAAGATGTAACCATTG 
GGGGAAACTGGGTGAAGGCTACATGAGACCTCTCTGTACTTAATCTTTGCAACTTA 
TGTGAATCTATAATTATTCCAAAATAAAAAGTTTTAAAGAACCTAAGTATCCTTAT 
TACTGAGGGTCATCGTGCTAGAGAGCAAGGTTGGGCCAGAGCTTCTAGTTATTTAA 
F AATACTAAATACCAGCCTGGGCAACATAGCAAGACCCTGCCTCTACAAAAA.GCAAA 
AAAATTAGCTGGGCATGGTGGTACATGCCTGTGGTCCTAGTTACTCTTGGAGGAGT 
CTGAGGTGGGGAGCTTGAGCCTAGGAGTTTGAGGCCGCAGTGAGCCTTGATTGTGT 
CTCTGTACTCCAGTCTGGGCCACAGAGCAAGACCCGGTCTCTAAAAATAAATAAAT 
AAATA 

10 

Sequence ID 898 

ANTGCACTCCAGCTTGGTGACAGAGGGAGACTCC^ 

AAAAAAAGGGAGTAGCTTGAAGCCAC^ 

TCCCACAACTCACACCAGCACCACAAGCTAGCCTTTO 

15 CTTTCAACGCACACACCCCTGTGTCA^ 

CAGATACTGTTGAGTCCCTGGCCTGCCTATGAGAACGGCTCATGATCTCTATTTCT 
TCTGCTTAATGACCATCTCGAAGTAACAAGTTTAGCCTAAAATAAACTTGCTAAGT 
TAGCAAAGGAAGTCCTTAGCAGCCACCATTTCTCGATTCCTCGATCACCTCCCCTG 
CCCCTCAACTCCCTCATTTCTCCCAAGATATGGGCTCCAGGCTGGGCGCGGTGGCT 

2 0 CACGCCTATAATCCTAGCACTTTGGGAGGCTGAGGTGAGCAGATCACTGAGGTCAG 
GAGTTCG 



Sequence ID 899 
TCNTTCGGAACGCGCC 

25 

Sequence ID 900 

CTGGAGGGATGGGTAGGATTTTGACAAGAGTGGTTGAAGGTATTCTAATTCACTTA 
GTACCTACATGTGCGAGGCAGCATGAAGGCAAAAAAGCCTGGGGCATGTTCAGAGA 
ATAGCAAGTATTCTAGTTTGAGTGGCACCTGGTACGTATATAAGGGAATAGTAAAA 

3 0 GATCTGGCTGGAAAGGAAAAGTAGGGGCAGGTTACGAAGGACCTCTGAAAGTCAGA 
CTGTGGAACTGGAACTTTTATCAGGAAGCAGTAGTTAGTTTTTTCAAGCAAAAGCT 
AATTAGAGTTGATATTTAGGAGGATGAATCTAACAGTTGTGTGCAAGGATGCCTTC 
AAACTGAGTGAGACTAGTACTGGAGACTGGTTAAGAGACTACAACAATAACCTGAG 
TAAGAATTAATACAGGCCTGACCTAGTTTTGAGTGAGTAGGATTGGAAACAAGAGT 

35 TTTAGGTATTATAGGATTTATGCATATAAAATGGACTTGACAGAACTTGAAGAAAG 
AGAAAGTGTC3AAAA.GGACACAGAAAGTGAGGCAGGATATCTTACAATGTTAAAGGA 
* AAGGAATAATAGAAGTTAC 
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Sequence ID 903 

GGAAACATAAGCTTGTTTCAGTACACTCACGCTGTAGATTAATTCTGATATTACAT 
ATCTCCATCAGACTTTGTACCCTCTCTCTTCCATCCCTTACCCTTACCGATTAGGT 
TGGTATTACCTAAAAATCCATAGAAAATGTCCAGGTGAATTGCCTTATGCTTTCTA 
5 CCCCATAAGGTATAATT 

Sequence ID 904 

CTCTGTGGTGTGAGAACACAGTGGGTGACCAAGGCTTTCCAGATGAACCCAAGGAA 
AGTGAAAAAGCTGATGCTAATAACCAGACAAGAGAACCTCAGCTT^ 
10 CCAAGTGGAGGCACTCTTCAGTTATGAGGCTACCGAACCAGAGGACCTGGAGTTTC 
AGGAAGGGGATATAATCCTGGTGTTATCAAAGGTGAATGAAGAATGGCTGGAAGGG 
GAGTGCAAAGGGAAGGTGGGCATTTTCCCCAAAGTTTT^ 

TACAGATTTGGAAAGCACTCGGAGAGAAGTCTAGGATGTTTCACAAACTACAAAGC 
TGAAGAAAATGAAGCCCTATTACTTGTTTGTAAGATTTAGCACCCTTGTGCTGTAT 
1 5 ACTGTACTGAGACATTACAGTTTGGAAGTGTTAACTATTTATTCCCTGTTAAAATT 
TAACCTACTAGACAATGATGTGAGTACCCAGGATGATTTCCTGGGGCACAGTGGGT 
GAGGAGATGGGGACAGGTGAATGGAGGAGTTAGGGGAGAGGAAAAGTGGATGGAAG 
TGTCTGGAAAGGGCACCAAAAAAGTCTTCCAGGTCTGATCCTGTTTCTTGCTCTGA 
GTGCTAGCTACCACTGTGTCACACTGTAACATN 

20 

Sequence ID - 905 nt : 655 

CTCAGCTCTTGCCTGGTCACCTTGTGGCTTTTACCATCCTCATCCCCTGTGCCACC 
CACATCCTGCCACTTCTGCATGGAGTTGGGGTGGGGCCATTGGAGAAAAGAGGTTA 
AACAAGCAGTAATTTACTTGAGTACAGTCTTTGAGCCAATGAAATGCCAGTCATCA 

2 5 TTTCCCAGGGGTACTTGTCATCTTGTCAACAACCCGCTGATAATGCTCCTTCAATG 

TGAATAGCAAAAGTAGGGAGAGACGCTGAATGAAGAAGATGCCTACCCCTCAGGAA 
GACTGCTGTCCGCCTCCAGGCCTGCATGCACACACCCATGCCCACCTGCACCCCCA 
GCACCACGCCCACACTCACTCGCACACACCCACATGCCAGTGTTTTGGGGTTGGCA 
GCCTGGACACTGCTGAGGCAAACACAAGTCAT 

3 0 TCTGTCTCTGTTTTAGTTACAGGAATTTGGTCAGTTTAGAGGATTTAATAAGTCCG 

TGGAAAATTTGTTTGTGTCTCTTGCTACCCACGTGAAAAGTAAGTGCATGCTTCAT 
GATGTGTTTTCCCACTACCTTCCAGGCCAGCCGAGCCCACTGGCCANGGCCTGGCC 
CGGTGACCTCGGTTGACACTGTCCTCANGCCACTCACTT 

35 Sequence ID 906 

CAGAATTTCATGTTTATGCTGCAG^GGCCTGTATTTTATAATGGTGGCTCTTTTG 
GACGATGACTTCCTCGATGGTGAAACTTCCAGTAATCTCCCTCATCATACTGAAAT 
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GATATCAGTATATCATCAGAACACCATGGAGCTTGTCATTTGAGGGACACAGCTTG 
CTTGTGTGCTTGGGAAAGAAGAGGTTTAGCATGGTTTCAGGTCAGTGATGAGTCCIA 
ATGATCTCTGCAAGTTCCCTTAGCTCTGANAATTCTGATGTCATATGCACTTCTGC 
CGCCAGAGTTGCTGCTTACTGGATGCGTAAGAAGAAAAGAAAAAAAAAAAAAAA 

5 

Sequence ID - 907 nt : 582 

CTTCCATTGGGGGTAAAGATCAAACTTTAGGCGAGCCAGGTCTGTATCTCCATTCC 
TGTCTCTGACTGCTTCCCTGTAGGGATTGTCTGCAAGCGCACACCTGCATTTTCTT 
GTCCACAAGTCTATGCTCTAACTCTGTCACCTGCATGGCTGCAAATTAGCTTCCTT 

1 0 CTTCCTGCCCTCTTCTCTCTAGCTTGGATTTTGAATTTGAATGGCAGGCATGGGAT 
GTCCGTGTGTGTGTACTGCTGATGTGTACAGCCGCTTGTTAGCGCTCTCATTGTCT 
c TCAAATGTAAGTCATTTTGGCTGGGTGCGGTGGCTCATGCGTATAATCCCACGCTT 
TGGGAGGCTGAGGTGAGCTGATCATTTGAGGTTAGGAGTTCGAGACCAGCCTGGCC 
AACAT^O^AACTCGATCTCTA^ 

1 5 GCACGCCTG^AATCCCAGCTACTTGGAATGCTGAAGCAGGAGAATT 

ANGAGGCGGAGGTTGCGGTGAGCCAAGATCACGCCACTGCACTCCAACCTGGGTGA 
CAGAGCAAGGCTGTGTCTCAAA 

Sequence ID 908 

2 0 ACCTGACTTCAAACTATACTACGAGGCTACAGTAATCAAAACAGCATGGTACTAGT 
ACAAAAACAGACCAATGGAACAGAATAGAGATCTCAGAAATAAAACT^ 
GZ^CCATCTGATCTTCAAC^AACCTGACAAAACGAGCAATGGGGAAAGGATTCCCT 
ATTTAATAAATGGTGCTGGGAGAACTGGCT-AGCCATGTGCAGAAAATTGAAACTG 
GACCCCTTCCTTACACCTTATACAAAAATTAACTCAAGATGGATTAAAGACTTAAA 

2 5 TGTAGAACCCAAAACGATAAAAACCCTAGAAGAAAATCTAGGCAATATCATTAAGG 

ACATAGACATGGGG?^AAATTTCATGATC 

GCAGAAACTGACAAATGGGCTTCTGCACAGCAAAAGAAACTATCGTCAGAGTGAAC 
AGACAACCTACAGAATGGGAGACAGTTTTTGCAATCTATCCATCTGACAAAAGTCT 
AATATCCAGAATCTACAAGGAATTTAA 

* 30 

Sequence ID 910 

CAAAAAACAAGAATTACCCGGGCTTGGTGGTGCATGTCTGTAGTCCTATCTACTCA 
GGAGGCTGAGGCTGAAGGATCACTTGAGCCCAGGAGTTTGAGGCTGCAGTGAiGTGA 
GCCATGATCATGCCAGTGTACTCCAGCCTTGGCAGACTGAGCAAAACTTGGTCCCT 

3 5 CGCAAAATGTTGAAGCCCAGTTTTCACTATTAACCTGTATTTCAGTTTCCCCATGC 

TAACTTTGAAACACTGGGGCTGGCCTGAGGGTATAAAGGCTTATTCAAACTCAGTA 
ATTTAAACTTAAAATCCTAAGGAACTTGAAAAAGTGTAATCTAGTCCAAATGGGGC 
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ATCAATTCTAAAGCATTTGCTTGTTTGAGCAGATTTTCTGTGTCTGAGGTATATAG 
ATAACTTATCTTTTTATGACTAAATCCAAGTCCTTAGTTCCTGTTGGAATTCAAAA 
TCATATTTAAAAATTGATGCTTTGTTCTATAATTAATGCTTTGATTGTATAAATAA 
TAAGTATTCTTCCAAATCCCTTTTTACAGATGATGATTCTGATACCGAGACGTCAA 
5 ATGACTTGCCAAAATTTGCAGATGGAATC 

CTGGNTCCCAGTCCTGTNCTTAAAATTCTAACTCGAC 

Sequence ID - 911 nt: 595 

GAGGGTGTAGAAGAGAAGAAGAAGGAGGTTCCTGCTGTGCCANAAACCCTTAAGAA 
1 0 AAAGCGAAGGAATTTCGCAGAGCTGAAGATCAAGCGCCTGAGAAAGAAGTTTGCCC - 
AAAAGATGCTTCGAAAGGCAAGGAGGAAGCTTATCTATGAAAAANCAAAGCACTAT 
CACAAGGAATATAGGCAGATGTACAAANCT 

AAAAGCTGGCAACTTCTATGTACCTGCAGAACCCAAATTGGCGTTTGTCATCAGAA 
TCAGAGGTATCAATGGAGTGAGCCC^ 
1 5 CTTCGTCAAATCTTCAATGGAACCTTTGTGAAGCTCAACAAGGCTTCGATTAAC^T 
GCTGAGGATTGTAGAGCCATATATTGCATGGGGGTACCCCAATCTGAAGTGAGTAA 
ATGAACTAATCTACAAGCGTGGTTATGGCAAAATCAATAAGAAGCGAATTGCTTTG 
ACAGATAACGCTTTGATTGCTCGATCTCTTGGTAAATACNGCATCATCTGCATGGA 
GGATTTGATTCATGAGATCTATACTGTTGGAAAAC 

20 

Sequence ID - 912 nt: 651 

-CATTTCCAGAGTTTATGTGAATTGAATTGAACTAT.GGTTTTATGTTACTGTCAGTA 
GAATGAAGTACGAATATTTGAAAAATACACCTTCAACTTCAAAGTGATTCTTGACA 
AAAATTATAAGGAATCATTTTGGACACATTTTCTGGTAGAGCCTTGTAAAAATTAA 

2 5 AACCAAGTGTTGTTTTCAAGAAGAACTGTAATACATAATCAGGAATTTGAGTAGGG 

AGATTATTTTGTTATTTAT^AATTAAAGTGGCTGTGTAGTTTTAACTTTAGTATTGC 
AGGTAGAGTAAGCTTACATGATAACAAAAATCTTGGTCTTAGTGACTTAATGATTC 
TGATATTTATTGATTGATTGGTTATCATTCCAAATATTTTAAAAGATAATAGCTGG 
CTGGGTGCGGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCAGGACGGGCG 

3 0 GATCACGAGGTCAGGAGATCAAGACGATCCTGGCTAACACGGTGAAACCCCGTCTC 

TACTAAAAATCAAAAAATTAGCCGGGTGTAGTGGCGGGCACCTGTAGTCCCAGCTA 
CTCAGGAGGCTGAGGCAGGAGAATGGCATGAACCTGGGAGGCGGAGCTTGCAGTGA 
GCTGAAATCGTGCCACTGCCTCCACCTGGCGACAA 

35 Sequence ID 913 

GTGAGGTGGGGACTTCATTCATTGTCCTATTTCTATCTCCACTTTGTGCCTGGAGA 
" GCTTTCAGGGGAGGTGGAGGAGGAGGGTCTGCCAAGCTACTGCAACATCTGTCACC 
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CACTATACCCAGTTACTTGGGGGAGGACAGACACTGTGGTGTCATTAAAGTTGTTT 
GAACCAAAGTGGCGGCTGCATCTTTGTCCCGATGCTAGCCGTGCCGGTCTCCCATC 
ATCCGCTCGCCCTCCTTTNCCCTGGGCTGCGCCCACTTGTCTTCCTGGATATTTGG 
GGGTGACTCGCCATGCTTGGCACCCTCTGCTTCCTGGTGCTGCTCTGACTCGAAGA 
5 CGGGACAGTCCCTGGTGCACATCCAGGGAAGAGGAGTGTCGGTAGTTCTTGCAGTA 
GGCACTTTATCAGGACCTGACCTGTTGCTGGGTGATTTTAGTCTCTACAAACAGAA 
AGCGTTTCAAAGCGTCAGCTGTGGGAGCAGAGTGACCCTTTGCTGATGCTGGGGGG 
AGGGGATCTAAATCCTCATTTATCTCT 

10 Sequence ID 914 

GGCGCCTGCTGGAGGAGGAGAGAGCTCTGCTGGCATGAGCCACAGTTTCTTGACTG 
GAGGCC3ATCAACCCTCTTGGTTGAGGCCTTGTTCTGAGCCCTGACATGTGCTTGGG 
CACTGGTGGGGCTGGGCTTCTGAGGTGGCCTCCTGCCCTGATCAGGGACCCTCCCC. 
GCTTTGCTGGGCCTCTCAGTTGAACAAAGCAGCAAAACA^ 
1 5 AAGATTANAAGCCTGGAATAATCAGGCTTTTTAAATGATGTAATTCCCACTGTAAT 
AGCATAGGGATTTTGGAAGCAGCTGCTGGTGGCTTGGGACATCAGTGGGGCCAAGG 
GTTCTCTGTCCCTGGTTCAACTGTGATTTGGCTTTCCCGTGTCTTTCCTGGTGATG 
CCTTGTTTGGGGTTCTGTGGGTTTGGGTGGGAAGAGGGCCATCTGCCTGAATGTAA 
CCTGCTAGCTCTCCGAAGCCCTGCGGGCCTGCTTGTGTGAACCGTGTGGACAGTGG 
20 TGGCCGCGCTGTGCCTGCTCGTGTTGCCTACATGTCCCTGGCTGTTGAGGCGCTGC 
TTTAACCTGCACCCCTNCCTTG-CTCATANATGCTCCTTTTGA 

Sequence ID - 915 nt : 230 

TTTGAGACCAGCCTAGCCAACATGGTGAAACCCCATCTCTACTAAAAATACAAAAA 
25 TTAGCCGGGCGTGGCGGCACATGCCTATAATCCCACTTACTTGGGAGGCTGANGTA 
GGAGAATCGCTTGAACCCANANAGGCAGAGTTTGCAGTGAGCCGAGATTGTGCCAT 
TGCACTCCAGCCTGGGCGACAGAGCGAGACTCCATCTAAAANAAAATAAATGAATA 
AAATAA 

3 0 Sequence ID 917 

NNCAGATTTTTTTTTTTTTTTCAGNGTTAGACCATCTTTCAATTCCTGGAACAAAC 
TTAACTTTCCATGATATGTATTTTTTATACATTGCTGGATTTTATTTGCTAATATT 
TTACTTAGGATTTAATTTTCTAAGTNGACCTATAATTNTCCTGTATAAAATTGCAT 
TTGTCACATTTTAGTATCAAGGTTGTCCTANCNCCATGAAATGGATTTANAATGGT 

3 5 TTATGTAANATAAAGTACATTTCTTCTAAAGGTTTGNGTGGATTAACTTTCAAATC 
TGCCANAGNGNGTTTTTTTCCTTTTTTTTTTTTTTTCATTTNAAGGGAGNGCAAGT 
ANCTTTTCAAATNCTGATTTAATTTTTAAAATATTTNCAAGTNTNTTTANAGTTTT 
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TATTTITOTNTNGAANGTTJU^CATTTTTATANAAAANGGTI^TATCTTTTTAAATTC 
TTTGACATCAGTTTCTTCANAATTCCTTCTTTTAA 

Sequence ID 926 

GTCATATCTCTTCCCAGGGAAAGCAGGAGCCCTTCTGGAGCCCTTCAGCAGGGTCA 
GGGCCCCTCGTCTTCCCCTCCTTTCCCAGAGCCATCTTCCCAGTCCACCATCCCCA 
TCGTGGGCATTGTTGCTGGCCTGGCTGTCCTAGCAGTTGTGGTCATCGGAGCTGTG 
GTCGCTACTGTGATGTGTAGGAGGAAGAGCTCAGGTAGGGAAGGGGTGAGGGGTGG 
GGTCTGGGTTTTCTTGTCCCACTGGGGGTTTCAAGCCCCAGGTAGAAGTGTTCCCT 
GCCTCATTACTGGGAAGCAGCATCCACACAGGGGCTAACGCAGCCTGGGACCCTGT 
GTGCCAGGACTTACTCTTTTGTGCAGGACATGTGACAATGAAGGACGGATGTATGA 
CCTTGATGGTTGTGGTGTTGGGGTCCTGATTTCAGCATTCATGAGTCAGGGGAAGG 
TCCCTGCTAAGGACAGACCTTAGGAGGGCAGTTGGTCCAGGACCCACACTTGCTTT 
CCTCGTGTTTCCTGATG'CTGCCTTGGGTCTGTAG 

Sequence ID 93 8 

TGGCCATCCTTTTCCCCCCAAACACACCCCCTTAACCTATCTCTTGGGACTTAGCC 
CGACCCTCCCTCTCATTTCCCATTAAGTCTGAGAGGCAAGAGCTAGGTTAGGCAAG 
GAGGTGGTTGGCCAGAGATGGGGAACAGCCAGGTGCCCCAGTCCTCTGATTTTTCC 
TCCATCCTGCTTACCACCTCCCTGGGTACTTACAGCCTTCTCTTGGGAACAGCCGG 
GGCCAGGACTGGGTCACCTATGAGCTGAATCAGCATCTCCTCCTGAGTCCCAGGGC 
CCCTGCAGTTCCCAGTCTCTTCTGTCCTGCAGCCCTTGCCTCTTTCCCACAGGTTC 
CACTTTATATCCACCTTTTCCTTTTGTTCAATTTTTATTTTTATTTTTTTTATTAT 
TAAATGATGTGGTCTATGGAAAAAAAAATAAAAATCTGACTTAGTTTT 

Sequence ID - 939 nt : 513 

GGAACCCAGTGTATTACCTGCTGGAACCAAGGAAACTAACAATGTAGGTTACTAGT 

GAATACCCCAATGGTTTCTCCAATTATGCCCATGCCACCAAAACAATAAAACAAAA 

TTCTCTAACACTGCAAAGAGTGAGCCATGCCTGTTAACACTGTAAAGAATGTAACA 

TGTGGGGGACACACAGGGGCAGATGGGATGGTTTAGTTTAGGATTTTATTAGTGCA 

TGCCCTACCCTCTGGGGGAACGTCCCATCTGAGGTTTTCTTCTCGGtGGGGGGATT 

TAACTTCTGTCCTAGGGAAAACAGTGTCTGATGAGGAGTGTTTCCAACACAGGCTA 

CATGAATTCCCCTATACCAGTGCGAAAGCAGCCAGGAGTCCCCGTTGGAAAAGAAC 

AATGCCACTCTCTTTTATGTATCTTGGTTCTGCAACTCATTTGTTGTAAGTAGGGT 

TAATCGAGTATCAGGTTCACAGTATCCTGCCCTTATTATTTTATGATTCACTGACT 

CAAGTTCCA 
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Sequence ID 947 

GAGAGTGAAAAAATTCTGGTACAAATTGGGAAATTAGTATATAACAACATAGTGTT 
AAATTCAATGGGAAAAGTTTAATAAGAGGATTTGGTATCAACTGGCTGTCCAAAGA 
TAAAAATGGACCGTCCTATCACATACAAAATTGTTTTTTAGATAAAGATTTAAATA 
? CAGGCACTCCTTCATTTGCGTGGTGCACCTTGAGGTGTTGCAGAAATGATGAGAGC 
TGAAACTGCAAAGCAATTTTAATACTTTATCTGTTGGAT^ATCTTATAGTTTTCCTG 
TGACCGTTAAAATTTTCATTAAACTATTAAAAACACCGATGACTGGTCACAAATGT 
ATTGGGAAATGGAAAAGAATTAATACACTAAAAATACAAAAAATAGAAAATATTTA 
AAATTATCTAAAAATTTGAAACATTAGAAAAATTGAGAACTAGGCAGGGCGTGGTG 
10 GCTCACATCTGTAATTTTAGCCCTTTGGGAGGCTGANGCAGGTGGAT(5ACCTGANG 
TCAGGAGTTCGAGACCAGCCTGCCAACGTGGGGAAACCCCGTCTCTACTGAAAATA 
CAT^AAATTANCCGGGCATGGTGGCACAAGCCTGTAATNCTT 
GAGGCAGGAGAATCACTTGAACCCANGANG 

15 Sequence ID 949 

GTTTCACATGAGAAGGTAGTATTATGTACAGTGACCTTGTTTAAAGTGTCNGTTTA 
ATGTTACCACTAAGGCCCTGCCCCAGCTTTATCACCTGAGCACTAACAAGTGCTGT 
GTGGAGTTCAGTCCATGCTGGTAACTNTTGAGTATTCAGTGGGTCTTTTAACAATT 
ACCACCGTGGAGGANANAGGAAGGAAGAGAAATGCTGTGATCTTTTNCTGTTTTTA 

2 0 ATTAGNGAAAGAGGGrATTANATTAAACAAATGTTACAGAGNTGTGACTNTGATCCC 

CCAGNGGTAAGCAATAATTGTANAGACTGGATTTNANAAGCCCTGAGAGTTTATTT 
TCAACCTATNTATTATAGNNCAATCC 

Sequence ID 1028 
25 ACAAGGCTTGGGGGCTGGACTCCCTCTACTGCCTCTGGCCATACCCCCTCCTGGAG 
ATGGGGTCAAGGCACCAGGACTGA 

Sequence ID - 1056 nt : 435 

TCGCTTGTAAAGCCTGAGACAGCTGCCTGTGTGGGACTGAGATGCAGGATTTCTTC 

3 0 ACACCTCTCCTTTGTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCATCT 

GAATGTGTCTGCGTTCCTGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCC 
CCGTGTCCACCGTGACCCCTGTCCCCACACTGACCTGTGTTCCCTCCCCGATCATC 
TTTCCTGTTCCAGAGAAGTGGGCTGGATGTCTCCATCTCTGTCTCAACTTCATGGT 
GCGCTGAGCTGCAACTTCTTACTTCCCTAATGAAGTTAAGAACCTGAATATAAATT 
3 5 TGTTTTCTCAAATATTTGCTATGAAGGGTTGATGGATTAATTAAATAAGTCAATTC 
CTGGAAGTTGAGAGAGCAAATAAAGACCTGAGAACCTTCCAGA 
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Sequence ID 1071 
NGATATAGTNCCGCATGGGAAAGATG^ 

GACTAACCCCCCTGCCTTCTGCATAATGAATTAACTAGAAATAACTTNGCAAGGAG 
AGCCAAAGCTAAGACCCCTGAAACCAGACGAGCTACCTAAGAACAGOT 
5 ACACCCGTCTATGTAGCAAAATAGTGGGAAGATTTATAGGTAGAGGCGACAAACCT 
ACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACTTTAAATT 
NGCCCACAGAACCCTCTAAATCCCCTTGTAAATTTAACTGTTAGTCCAAAGAGGAA 
CAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAAAATTTAACACCC 
ATAGTAGGCCTAAAAGCAGCCACCAATTAAGAA 
10 ACCTAAAJU^TCCC^AACATAT^ 

TCACCCTATAGAAGAACTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGC 
ATAAGCCTGCN 

Sequence ID - 1074 nt: 689 

15 GGGAGGCGGAGGCTGCAGTGAGCTGAGATCGTGCCACTTCATTCCAGCCTGGGCAA 
CAAAGCGAAACTCTGTCTCAAAAAAAAAAAAAAAAAAAATTTGTTGACTGTTG 
TTTAAAGCTTGTCATTTTTTATTTAGTAATAACACTCATTAGTGTAGTATCTATGA 
TGAACCAGGTTCTGCACAAAGTACCTTATGTTCATGGCCTCATATCGTCTTCTCCA 
AAACTCTGCAAGATAGGATTCATCACCACTTATAGGGAGAGATCTGAAAGTTTAAA 
2 0 ATTGTACCCAAGGTCACACAGCTGGTAAGTGCCAGAGCTGGGATTCCGTAGGGTGT 
TCANAGTGCCTCTCCTGCCGTAGGCTTATCACAAAAAGTCAAAGTTTGGTCATAAT 
AAAGCCTGAAGTTTGGCAGGATTTAAAAATAGTCACCANACTTTTGAGTTGGAGCA 
TCCCACCTCACTGCTGTTCACCTTCTGTGGCAGGGAGAGTCATCATTTCCATTTCA 
GCTTGTGGAATATCTTGTCATTAACATTCTCATGCAAAAGCCATTTTATGGTGCCC 

2 5 AATGAANATGGTTAAGCTACTGCCCCAAGCCTNTGGAAGCCTTCCTAATTTTGGAC 

TTGCACTATGCAAATTGNATAATATTTTCTCTACCCTAAGCCAAATATTTTCTTCA 
CTTTTCATTCATTCTAC 

Sequence ID 1081 

3 0 CGCCGCCGCGCCGCCGTCGCTCTCCAACGCCAGCGCCGCCTCTCGCTCGCCGAGCT 

CCAGCCGAAGGAGAAGGGGGGTAAGTAAGGAGGTCTCTGTACCATGGCTCGTACAA 
AGCAGACTGCCCGCAAATCGACCGGTGGTAAAGCACCCAGGAAGCAACTGGCTACA 
AAAGCCGCTCGCAAGAGTGCGCCCTCTACTGGAGGGGTGAAGAAACCTCATCGTTA 
CAGGCCTGGTACTGTGGCGCTCCGTGAAATTAGACGTTATCAGAAGTCCACTGAAC 
3 5 TTCTGATTCGCAAACTTCCCTTCCAGCGTCTGGTGCGAGAAATTGCTCAGGACTTT 
AAAACAGATCTGCGCTTCCAGAGCGCANCTATCGGTGCTTTGCAGGAGGCAAGTGA 
GGCCTATCTGGTTGGCCTTTTTGAAGACACGAAlCCTGTGTGCTATCCATGCCAAAC 
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GTGTAACAATTATGCCAAAAGACATCCAGCTAGCACGCCGCATACGTGGAGAACGT 
GCTTAAGAATCCACTATGATGGGAAACATTTCATTCTC 

Sequence ID - 1083 # nt : 198 

5 GCGCGTCGACTTTGTTTAGACATTGAATGACTTTGTTAAAGGCACAATTAATCACA 
TTGGTTGTACTCTGNNGACAGCCTTCTTTAAAAAAAAAATAAACAATTTAAAAC^ 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAANTTTTAACC 

10 Sequence ID - 1084 nt : i98 

GCGCGTCGACTTTQTTTAGACATTGAATGAeTTTGOT 
TTGGTTGTACTCTGNNGACAGCCTTCTTTAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^ 
AAAAAAAAAAAAAAAAAAAAANTTTTAACC 

15 ■ - * . - ' 

Sequence ID - 1099 nt : 561 

TGCATGCTTGTGGATTGGAAAAACTTTGGAGACTGATTACTTTTCATTATATATGT 
GTCACAGTGAAACAGCTTTTATGTGTCATGTAAGATTACTGCTTGCCTCTCTAAGG 
AAGGTCGTGACTGTTTAAATAGACGGGCAAGGTGGAACCTTTTGAAAGATGAGCTT 

2 0 TTGAATATAAGTTGTCTGCTAGATCATGGTTTGTATTGAACTAACAAGGTTTGCAG 
ATCTGCTGACTTATATAAAGCTTTTTGATTCCTACTAAGCTTTAAGATTTAAAAAA 
TGTTCAATGTTGAAATTTCTGTGGGGCTCTATTTTTGCTTTGGCTTTCTGGTGAGA 
GAGTGAGGAAGCATTCTTTCCTTCACTAAGTTTGTCTTTCTTGTCTTCTGGATAGA 
TTGATTTTAAGAGACTAAGGGAATTTACAAACTAAAGATTTTAGTCATCTGGTGGA 

2 5 AAAGGAGACTTTAAGATTGTTTAGGGCTGGGCGGGGTGACTCACATCTGTAATCCC 

AGCACTTTGGGAGGCCAAGGCAGGCAGAACACTTGAAGGAGTTCAAGACCAGCGTG 
G 

Sequence ID 1109 

3 0 TTTGNCGGTNTTGGANNNNNANAANTTTCTTC 

AATTAANATGGNTTTNGNGGGTTCNTTNC™ 

NTCNTNCNNNTTCCTTNNCCCTNAANCTACCTTCCCCCNATTTTCTCCCCTNTTCN 
TNAATTANCATCCTCTCCNCNTANNTCNANACNTTAATGGCAANACTAT^ 
CNANNATAANANCTCCTGTNlSn^CCACAT^ 
35 CCNCAGAGTNAACTCATCCTCNNCI^^ 

GCGANATATTAANNANACCNGTANNTNNN^ 
NANNTTTTAGCNTCNNGCNNTAACNNM 
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ATNATNCTNCNNCGAANTNTCANNCNTCTCCNCT^ 
NTNCCTCGNimANAGCACTA^ 

TGGTCCANTNNCGTTNGNCGCNMNNAATNTTCGTNNNNCNN 

» 

Sequence ID 1118 

GGATTTTAGAGGAAGGCGCTNGGTTACATTGGAGAACTGGAGTGGTCTGGAGTTCC 
ACGGTGTAGTGGACCAGAGGCCACCTCTCCTGGGCTTCTCAGTGTCTCGCCGGCGG 
GGTTCGGCCTGAGCTGGATTGACATAGCCCTTGGCGGATTTAAACAACCTAAACAT 
TAAGCAGTACAGCTGCCTCAAACCTTTGGGATTTTCAGAATGACTGACACTGCCGA 
AGCTGTTCC^AAGTTTGAAGAGATGTTTGCTAGTAGATTCACAGAAAATGACAAGG 
AGTATCAC^AATACCTGAAACGCCCTCCTGAGTCTCCTCCAATTGTTGAGGAATGG 
AATAGCANAGCTGGTGGGAACCAAAGAAACAGAGGCAATCGGTTGCAAGACAACAG 
ACAGTTCAGAGGCAGGGACAACAGATGGGGGTGGCCAAGTGACAATCGATCCAATC 
AGTGGCATGGACGATCCTGGGGTAACAACTACCCGCAACACAGACAAGAACCTTAC 
TATCCCCAGCAATATGGACATTATGGTTACAACCAGCGGCCTCCTTACGGTTACTA 
CTGATAGAAATGTTGGCAGCTTTTAGTAAAAGCATTTACTCTGTTACCATGAGAAA 

Sequence ID 1125 

NGACTGGCTCCCGAAAAGAAGGGTGGCGAGAANAAAAAGGGCCGTTCTGCCATGGA 
CGAAGTGGTAACCCGCGAATACACCATCAACATTNACAAGCGCATCCATGGAGTGG 
GCTTCAAGAANCGTGCACCTCGGGCACTCAAAGAGATTCGGAAATTTGCCATGAAG 
GAGATGGGAACTCCATATGTGCGCATTGACACCAGGCTCAACAAANCTGTCTGGGC 
CAAAGGAATAAGGAATGTGCCATACCGAATCCGTGTGCGGCTGTCCANAAAACGTA 
ATGAGGATGAAGATTCACCAAATAAGCTNTATACTTTGGTTACCTATGTACCTGTT 
ACCACTTTCAAAAATCTACAGACAGTCAATGTGGATGANAACNAATCGCTGATCGT 

CAGAT C AAANAAANT 

Sequence ID - 1139 nt : 503 

CAGCACTGCCAGTGGAGATGGGCGTCACTACTGCTACCCTCATTTCACCTGCGCTG 

TGGACACTGAGAACATCCGCCGTGTGTTCAACGACTGCCGTGACATCATTCAGCGC 

ATGCACCTTCGTCAGTACGAGCTGCTCTAAGAAGGGAACCCCCAAATTTAATTAAA 

GCCTTAAGCACAATTAATTAAAAGTGAAACGTAATTGTACAAGCAGTTAATCACCC 

ACCATAGGGCATGATTAACAAAGCAACCTTTCCCTTCCCCCGAGTGATTTTGCGAA 

ACCCCCTTTTCCCTTCAGCTTGCTTAGATGTTCCAAATTTAGAAAGCTTAAGGCGG 

CCTACAGAAAAAGGAAAAAAGGCCACAAAAGTTCCCTCTCACTTTCAGTAAAAATA 

AATAAAACAGCAGCAGCAAACAAATAAAATGAAATAAAAGAAACAAATGAAATAAA 

• TATTGTGTTGTGCAGCATTAAAAAAAATCAAAATAAAAATTAAATGTGAGCAAAG 
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Sequence ID - 1148 nt : 587 

TGA?^AAATAAAGTTTTTATGTATATTCTACATATGTATATGTTGGTAGAAAGCAAA 
AACGCTAGGTAAAAATAAATGTAATACAATTTTAGCTATGAACCAAAAAACCATTT 
.GTGGTGTGGATGCAAGAAAGTCTGGATGGGTGCAGAGTTCTCCATGTTTCACTTCT 
5 GACATTTGAAAATACGCAGTTTGC^TTTGATACGTCAAATGTTATTTTTAAGAAAA 
CCAATAAAATCATTAAAACCGAAAAGGCAGTTTTGCTTGTTTTTACCTTAGTTGGA 
GTTATCTGCAATTGCCGTATTAGTGTTTTAAGGAACTTGTAAGTAAGCTCCTTAGT 
CCCCTTTAGAGCTACGAAACATGTCAATTTTACTTTTCTCCAGCTTTTTGGAATCT 
TATCTAAATTACCATGTAGAGTTCTGCATAGCTTCAAATTCTCTTAGCCAATGTGG 
1 0 TCTGTAAGTGTCTATCGATGAATTTCACCGTTAATTGCCGTAGTATACTGTCCTGT 
ACCGGATGTGAAGAGGAGCAACTCTGC^CAGTGCACTGGTTGCTCCCAT 
ANGAATGGCTTATCAAtGGTCGGATTT 

Sequence ID - 1160 nt: 650 

15 GGAGGATGGAGCAGT0AGCGGGTGTGGGCGGCTGCTGGCAGCGCCATGGAGACGGT 
ACAGCTGAGGAACCCGCCGCGCCGGCAGCTGAAZy^GTTGGATGAAGATAGTTTAA 
CCAAACAACCAGAAGAAGTATTTGATGTCTTAGAGAAACTTGGAGAAGGGTGAGTG 
TAAAGAAACTATAGGTAGGTCATTGGGTCCCAGTCTTTTTCCTGCCCCAGAAGAAG 
CAGAAGGATATGAACCTTTCAGGATTGTTCTAGGTGGGGTGGAAGGTAAATTTACA 
2 0 GCTTGTGATGTCCTTCTTCGCTTTACTCCAATCCCTATTATAGACAGATTTAGTGA 
TTCCTGGTCTTTTTAACACGAAGAATATCTATTGTTTTCTCTTTTGTAGGATCTGT 
ATGATTTTATCTACTTAACAGATAGCACTAATTAGATTl^AAATTCTATAAGAAACT 
TTTTAATTTGCTGTTCATAATTTCTGATTGGTATGCAATAACTGTTTCAATGAAAA 
TCAATGTAATTTAGTATTTTAATATTTGCACCTTTGTGAAATATAGTAAATAAATT 

2 5 AAGCACTATCACCACCTTCACAGCTACTTAGGAGATCCACAATCCTGGGTTGGGAG 

CCAGTGGATTTCCTGAAACACAGATTTGTTAATG 

Sequence ID - 1165 nt : 502 

CTCAAGTGAATCCTGGCTTCTTGGAAGCGCTTGCCTAGACGAGACACAGTGCATAA 
30 AAACAACTTTTGGGGGACAGGTATGTTTTCTTGCAGCTGCGGTTGTAAGGTCTTGG 
CAAGACAAGCAGTGTGGCCAGAATTTTGAACTTCTGATGAATGTGTAATGGAAAGG 
ACCTTGTACATTTTTTTGTTTCAAGGTCCTCAAAATGAGCACATGAAGAGGTTGCT 
GTGAAACTTTAAGTGGCCCTACTGCGCAGAAGCATTCAGATGTCACTTGATGATCT 
GTAAGGGAACTTGCTGATTTGGGAATGTGCTTAGGGAACACACATTCCTTTTGACA 

3 5 GGGTCTGTCACTGGGTGGGTGATGAATTATACAGATGACATGTGCTTTTTTTTCTT 

TTTTCAACCTCAATGGTATTCCTACAGGAAATGGATAACCATTTTAACTGTATTTT 
TTGCAGCCCGTACCTTCTTGGGAATACAATTGTCTAACTTTTTATTTTTGGTCT 
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Sequence ID - 1172 nt : 648 

CCACAATAATAAGAGAAAAAGAGGAGGAAAAGGATATAC^ 

AATAACAAAGTGACAGGAGTAAGTCCTTAACTGGCAATAATAACCATGAATCTAAA 
TGGATTCCATTTCCCACTTAAAAGATAAAGACATGCTGAATGGATAAAAAGCTGTC 
ACCCAGTTATATGCTGCCTACAAGAAACT 

TGGAAAGAGAAGGCATGGGAAAAGATACTCTACTCAAATGAAAACT^AAT^ACCAAAC 

AAAGGTGGCTATTCTTATATGAGATAATACAGACATTAAATCAAAAACTGGAAACA 

AACACAAAGTCATTGTATAATGATGAATTCAATTATATCATGATGAATTCAATTAT 

ATCCTCCTTCCTGATCAATTCAGAAAGGAGGATATAATCTTTTTAAATATATATAC 

ACCd^C^CCAGAGCATATAAATATGTAAAGGAAGATAAAGGGAGTCC^ 

AGAATAAATATAAGAATTATAAATATTTTATCTAAAGTGATAGATAGACTGTAATA 

CAATAATAGGGTGGTGACATTAAC^CCCCCTCTCACATTGGACTGATCATCTAGAA 

GGGAGAAAAAGCTTTATGATTGGAAAAGCCAT 

Sequence ID 1178 

ATTGTGTTGGCCACCCGGGAATTCGCGGCCGCGTCGACCTACGCACACGAGAACAT, 
GCCTCTCGCAAAGGATCTCCTTCATCCCTCTCCAGAAGAGGAGAAGAGGAAACACA 
AGAAGAAACGCCTGGTGCAGAGCCCCAATTCCTACTTCATGGATGTGAAATGCCCA 
GGTGAGGAGACGGCTTGCTGTAGTGGGGAAAGCACTGGACCTCAACAGTTGGAAAA 
TGTTGTAGTGTTAGCTGTCTCGTATCCTTGAAGCTGTGCAGCAGCTTCAGTTTCTT 
CGCCTGTGGAAAATATTTTCCCTGATACTCTTAAAATTTGAATGTATGAGACTGGC 
AAAGTTTTGCATCTTAGGAGGAGTGATTCATTTCACCGTGATCTCTCATCACATTT 
CACATACAACCCCTACGTTTTTTTGTGTTGGGAAACAATGTAATGGATGATGAGTT 
GGGC^TAAGTGCAGGAAAGACGGGTGTAATAGAGGAAAAAAATGTTATCTGCTTTT 
CTTTCAGKjATGCTATAAAATCACC^ 

GTGTGTTGGCTGCTCCACTGTCCTCTGCCAGCCTACAGGAGGAAAAGCAAGGCTTA 
CAGAAGGATGTTCCTTCAGGAGGAAGCAGCACTAAAAGCACTCTGAGTCAANATGA 
GTGGGAAACCATCTCAATAAACACATTTTGGAT 

Sequence ID - 1180 nt : 622 

CTTTTCCTCCCGCTGTCCCCCACGGGAGGGGACTGCTCTCCCCCGCTGCATCCTTT 

CTGTGAGGTACCTTACCCACCTCAGCACCTGAGAGGGTGAAATAGAATTCTAACCT 

CGACATTCGGGAAGTGTTTTTGAGAAGTCTCGGTCGGTAAGGGAAGTCTTCCAAGT 

CCGTGCAGCACTAACGTATTGGCACCTGCCTCCTCTTCGGCCACCCCCCAGATGAG 

GCAGCTGTGACTGTGTCAAGGGAAGCCACGACTCTGACCATAGTCTTCTCTCAGCT 

TCCACTGCCGTCTCCACAGGAAACCCAGAAGTTCTGTGAACAAGTCCATGCTGCCA 

TO^GGCATTTA^TG^GTGTACTATTTGCTTCCAAAGGATCAGGCCCTGAGAACA 
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ATGACCTTATTTCCTACAACAGTGTCTGGGTTGCGTGCCAGCAGATGCCTCAGATA 
CCAAGAGATAACAAAGCTGCAGCTCTTTTGATGCTGACCAAGAATGTGGATTTTGT 
GAAGGATGCACATGAAGAAATGGAGCAGGCTGTGGAAGAATGTGACCCTTACTCTG 
GCCTCTTGAATGATACTGAGGAGAACAACTCTGACANCCACAATCATGAGGATGAT 
GTGTTG 

Sequence ID - 1181 nt : 155 

CGCCACTTATCCAGTGAACCACTATCACGAAAAAAACTCTACCTCTCTATACTAAT 
CTCCCTACAAATCTCCTTAATTATAACATT 

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEA 
Sequence ID 1182 

CATTGTGTTGGCNCCCGGGAATTCGCGGCCGCGTCGACTTT7TTGTGTTGTTTGGAG 
CAGAAATACTAAAGAAGATTCCGGGCCGAGTATCCACAGAAGTGGACGCAAGGCTC 
TCCTTTGATAAAGATGCGATGGTGGCCAGAGCCAGGCGGCTCATCGAGCTCTACAA 
GGAAGCTGGGATCAGCAAGGACCGAATTCTTATAAAGCTGTCATCAACCTGGGAAG 
GAATTCAGGCTGGAAAGGAGCTCGAGGAGCAGCACGGCATCCACTGCAACATGACG 
TTACTCTTCTCCTTCGCCCAGGCTGTGGCCTGTGCCGAGGCGGGTGTGACCCTCAT 
CTCCCCATTTGTTGGGCGCATCCTTGATTGGCATGTGGCAAACACCGACAAGAAAT 
CCTATGAGCCCCTGGAAGACCCTGGGGTAAAGAGTGTCACTAAAATCTACAACTAC 
TACAAGAAGTTTAGCTAGZ^AAACCATTGTCATGGGCGCCTCCTTCCGCAACACGGG 
CGAGATCA2^AGCACTGGCCGGCTGTGACTTCCTCACCATCTCACCCAAGCTCCTGG 
GAGAGCTGCTGCAGGACAACGCCAAGCTGGTGCCTGTGCTCTCAGCCAAGGCGGCC 
CAAGCCAGTGACCTGGAAAAAATCCACCTGGATGAGAAGTCTTTCCGTTGGTTGCA 
CAACGAGGACCAGATGGCTGTGGAGAAG 

Sequence ID - 1183 nt : 479 

CGTGGCAGCCATCTCCTTCTCGGCATCATGGCCGCCCTCAGACCCCTTGTGAAGCC 
CAAGATCGTCAAAAAGAGAACCAAGAAGTTCATCCGGCACCAGTCAGACCGATATG 
0 TCAAAATTAAGCGTAACTGGCGGAAACCCAGAGGCATTGACAACAGGGTTCGTAGA 
AGATTCAAGGGCCAGATCTTGATGCCCAACATTGGTTATGGAAGCAACAAAAAAAC 
AAAGCACATGCTGCCCAGTGGCTTCCGGAAGTTCCTGGTCCACAACGTCAAGGAGC 
TGGAAGTGCTGCTGATGTGCAACAAATCTTACTGTGCCGAGATCGCTCACAATGTT 
TCCTCCAAGAACCGCAAAGCCATCGTGGAAAGAGCTGCCCAACTGGCCATCAGAGT 
5 CACCAACCCCAATGCCAGGCTGCGCAGTGAAGAAAATGAGTAGGCAGCTCATGTGC 
ACGTTTTCTGTTTAAATAAATGTAAAAACTG 
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Sequence ID - 1185 nt: 628 

CTTTGATTACCTTTGAGTATTAGGTTGAAAGCTTCTCTGTGCTTGATTGAACATTG 
TGATGATGTTGATTGGGTCATGTCAGATTTAGACAGTGTTGTGTTTAAGATAAATG 
TTTAATGGCTCTTAGCAGTGTTCATGCCTCCCCTTTTCCCCTGATACTTTAAAAAC 
AGAATATACAGAAAAGGGGAGTTGGGTGAAGAATCACCATATTCTCATTACCAGAG 
TAGTGTCTACCAGCTGTTTTCACATTTTTCTGTTTCCTTCTGTCCTTGGAATCCTT 
TTTTTAGATCCTTGTAATACTAGTAAAGATATTCCACTCTGTGTTGTAAGCATTTT 
TCCATTTTGCTCCATGGTCTTCATAATGCCCTGTGGTCCTTTATTAAGGGGATGCA 
CCATGTAGAGGTGAAAGGCTTTCCTTGACTTGGCCACCATTTCTGTATTTTCCTTA 
GAGGAGGAGGTTTCC^CATTTCT^ 
GGCAGGAGTGCAGTGGCATGATAACAGCT 
AGTTATCCTCCCACCT^ 
CACCCAGCTAAT 

Sequence ID - 1186 6 nt: 494 

CAGCCCTCCGTCACCTCTTCACCGCACCCTCGGACTGCCCCAAGGCCCCCGCCGCC 
GCCTCCAGCGCCGCGCAGCGACCGCCGCCGCCGCCGCCTCTCCTTAGTCGCCGCCA 
TGACGACCGCGTCCACCTCGCAGGTGCGCCAGAACTACCACCAGGACTCAGAGGCC 
GCCATCAACCGCCAGATCAACCTGGAGCTCTACGCCTCCTACGTTTACCTGTCCAT 
GTCTTACTACTTTGACCGCGATGATGTGGCTTTGAAGAACTTTGCCAAATACTTTC 
TTCACCAATCTCATGAGGAGAGGGGAACATGCTGAGAAACTGATGAAGCTGCAGAA 
CCAACGAGGGTGGCCGAATCTTCCTTCAGGATATCAAGAAACCAGACTGTGATGAC 
TGGGAGAGCGGGCTGAATGCAATGGAGTGTGCATTACATTTGGAAAAAAATGTGAA 
TCAGTCACTACTGGAACTGCACAAACTGGCCA.CTGACAAAAATGAC 

5 

Sequence ID - 1188 nt : 599 

GGGAGACAAGCCCAGCCTTTCGGCGAGNATACGTCTAACCCTGTGCAACAGCCACT 
ACATTACTTCAAACTGAGATCCTTCCTTTTGAGGGAGCAAGTCCTTCCCTTTCATT 
TTTTCCAGTCTTCCTCCCTGTGTATTCATTCTCATGATTATTATTTTAGTGGGGGC 

0 GGGGTGGGAAAGATTACTTTTTCTTTATGTGTTTGACGGGAAACAAAACTAGGTAA 
AATCTACAGTACACCACAAGGGTCACAATACTGTTGTGCGCACATCGCGGTAGGGC 
GTGGAAAGGGGCAGGCCANAGCTACCCGCAGAGTTCTCAGAATCATGCTGAGAGAG 
CTGGAGGCACCCATGCCATCTCAACCTCTTCCCCGCCCGTTTTACAAAGGGGGAGG 
CTAAAGCCCAGAGACAGCTTGATCAAAGGCACACAGCAAGTCAGGGTTGGAGCAGT 

5 AGCTGGAGGGACCTTGTCTCCCAGCTCAGGGCTCTTTCCTCCACACCATTCAGGTC 
TTTCTTTCCGAGGCCCCTGTCTCAGGGTGAGGTGCTTGAGTCTCCAACGGCAAGGG 
AACAAGTACTTCTTGATACCTGGGATACTGTGCCCAGAG 
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Sequence ID 1189 

GGGAGACAAGCCCAGCCTTTCGGCGAGATACGTCTAA.CCCTGTGCAACAGCCACTA 
CATTACTTCAAACTGAGATCCTTCCTTTTGAGGGAGCAAGTCCTTCCCTTTCATTT 
TTTCCAGTCTTCCTCCCTGTGTATTCATTCTCATGATTATTATTTTAGTGGGGGCG 
5 GGGTGGGAAAGATTACTTTTTCTTTATGTGTTTGACGGGAAACAAAACTAGGTAAA 
ATCTACAGTACACCACAAGGGTCACAATACTGTTGTGCGCACATCGCGGTAGGGCG 
TGGAAAGGGGCAGGCCAGAGCTACCCGCAGAGTTCTCAGAATCATGCTGAGAGAGC 
TGGAGGCACCCATGCGATCTCAACCTCTTCCCCGCCCGTTTTACAAAGGGGGAGGC 
TAAAGCCGAGAGACAGCTTGATCAAAGGCACACAGCAAGTCAGGGTTGGAGCA 
GCTGGAGGGACCTTGTCTCCGAG 

TTCTTTCCGAGGCCCCTGTCTCAGGGTGAGGTGCTTGAGTCTCCAACGGCAAGGGA 
ACAAGTACTTCTTGATACCTGGGATACTGTGCCCAGAGCCTCGAGGAGGT 

Sequence ID 1190 

GTTTAAATTTGACAAACTAAAGCTNATNACTGCTATAAGAGTAATAAC 
TTCCATAACTCATTCTTAAAGTTTTAGTAATGTAAAAGTT^TTTTTTTGCAGTAAG 
TTATAATGATAGAAGCTTACATGTTTTTTCATGCCTCATCTGTTTCCCCTTAAAAC^ 
TATAATTATCAGTAAAGTCCTGTGGTATTTTTCAATTTGTAAGAAACTAGGCTATA 

TATAGATTGGGAAAAACAGCCTTCAT^ 
0 TGGTAATTGTGTGCTATTGCTTTTTGTTGACTTGCAAAAAAAAAAAAAAAAAAATT 
ACTATGACTTGNGGTAGCCCTGGAACCTTCGGAAGTGCTTAGCCCAGTCTGACCAT 
ACATTTATATTTANAATGCTTAGGTAAATAAATAATATGCCTAAACCCAATGCTAT 
AAGATACTATATAATATCTCATAATTTTAAAAATCACTGTTTTGTATAATAATAAA 
ACAAGGCAGGCAAGCTGTTCTACAATGACTGTTGGTAAGGGTGCTGAGGAAGAAAA 
5 ACAAACAATCTTGATTCAGGGATAGTGAATAGACAAAAAATGTCCTAATCAATGAA 
GCTGTGTGATGATTCTGATTGACAGAGA 

Sequence ID 1191 

GTGCAAAGTGTTATATCCACTTTCAACAAAGAGAGAAGCTGAAAAGCTAACCCAAT 
GTTAATTTTGGATCACACACATTCAGTGTAGACTTTAAGATTTTACTTCTGTTGGA 
GTAGCTATATTATTTCTAGTTAAAAAACTCTCTATATACATATTTATTTGTTTTTC 
TACTTGT TTAATATTTTT CT CTTCCAATTAGGAACTCAATATGGAATAAAAAATAT 
TTAAATGTATTTTACTCAAACGTGTGTGTATATATGTTTGTGTGCATGATAAGGAG 
AGTGAGAGCAAGAGTAAGAGAGAGAGAGCACGCATAGATGGAAGCACACATTTAAT 
GTCTATGAAATGAGAAAACATTAAGGCTAAGATATTTTTCCTTCTGAACTAGCAGA 
TTGTATCAATGGCTGGTCACTTAAATTAATCAGTTTGTAAAGATATTTAAAAGGTA 
TGTCTACCTTCTTGCAATTAATTTGATTATGTTCTAATGGC^TGGCAAGAGAAATG 
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AAAGAAGATAACTAAAAGTTAAAAGTCGTTGCATGTTTTTGTTGCAGCATACCCTT 
CTTTCAGGCTACCGAATAACCTTGATTGACATTGGATTAGTAGTAGAATACCTCAT 
TGGTAGAGCATATCGCAGCANCTACACTAGAAAACAT 

5 Sequence ID 1192 

GTCTGGAACTCCAGACCTCAGGTGATACCCCTGCCTCAGCCTCCCAATGTGCTGGG 
ATTACAGCTGTGAAGCCACCGCGCCCGGCTGCTGTGATAGTTGAGATGTAAACCAA 
AAATAAAATTCTAAGCO^CCCAATCCGACTGAATGGACCCTTCCTGTTGAGCAAGG 
ACATTCCAAAGTAAACTGAAAAGACCAGCTTAGGCCATGATGGGAAGGGGAGGTGT 
1 0 CAACATGCCTC^TTCTACCTTC 

TAACATTAAAAGAGAGATCTTAAGCTGGGCACGGTGGCTCAT 

CACTTTGGGAGGCCAAGGTGGGATCACCTC 

CCGGTATGGTpAAGCCATGTCTCTACTG 

TGCA ' 

15 

Sequence ID 1193 

TNCNTTTTTTTTCCCNCGGGAAAGCGCGCCATTGTGTTGGTCCCCGGGAATTCGCG 
GCCGCGTCGACGAGAAATGGCTTGAACCCAGTAGGCAGAGGTTGTAGTGAGCCCIAG 
AATNGGNCACCTGCACNTTTANCCNTGGGTGACAAAANTGAAAACTTTGTCTNAAA 
2 0 AAAAAAAAAAAAAAATTTTAANTNAAATNAAAAANCCT^ 

NGGGGGGGGNNTTTTTNGGGNTTNGNISTNTGGTAAAAANTNNNT 

GGGGCOSANNCCCCOT^ 

NTTNGGGGGGGGGGNTTTTNANCNTSTT^ 

2 5 GGGISHSIGCCCCCNNCCTTT^ 

AGGGNNGGGGGNANATNNCCCCCCCNGGNTTTTTTTTTTAA2\AANTNAANNGGGGG 

GGGlSnSINCTNANTNGGGGCNCCCANNGGGGGNTT 

CCNGNTTTTATmCCCCCCC^ 

TTTIOTGGNGGGNAAAAAAOTTT^^ 

3 0 TTTTNGGNAAANCCNNGGGGGGOT 

TTTTNTTNTNTTTTTCT 

AAAA 

35 Sequence ID 1195 

GTTCGTGACNTTCGGAGCTACCTGACAG 

* AGAGTGTTAGGCCTGAGCTTGAGAGCCCTGGAGAGACGTGTGCACAAAATGTGACC 
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TGAGGCCCTAGTCTAGCAAGAGGACATAGCAC^ 

CCTTGCAGAAAATATGAGCAATTTGATATTAACTAACATCTTCAATGTGCCATAGA 
CCTTCCCACI^AAGACTGTCCAATAATAAGAGATGCTTATCTATTTTA 

5 Sequence ID - 1196 nt: 412 

GTCGACGCGGCCGCGGTCGCTGGAGNCGATC^CTCTAGGCTCCAACTCGTTATGA 
AAAGTGGGAAGTACGTCCTGGGGTAC^GCAGACTCTGAAGATGATCAGAC^^GG^ 
AAAGCGAAATTGGTCATTCTCGCTAACAACTGCCCAGCTTTGAGGAAATCTGAAAT 
AGAGTACTATGCTATGTTGGCTAAAACTGGTGTCCATCACTACAGTGGCAATAATA 
10 TTGAACTGGGCACAGCATGCGGAlfiS^T 

GATCCAGGTGACTCTGACATCATTAGAAGCATGCCAGAAGAGACTGGTGAAAAGTA 
AACCTTTTC^CCTACAAAAT^ 
® TAATAAAATTTGCTTGTTTT * - 

15 Sequence ID 1197 - . ^ 

CCGCCAACATGGGCCGCGTTCGCACCAAAACCGTGAAGAAGGCGGCCCGGGT 
AT^GAAAAGTACTACACGCGCCTC^C^ 

CGAGGAGATCGCC^TTATCCCCAGCAAAAAGCTCCGCAACZ^AGATAGCAGGTTATG 
TCACGCATCTGATGAAGCGAATTCAGAGAGGCCCAGTAAGAGGTATCTCCATCAAG 

2 0 CTGCAGGAGGAGGAGAGAGAAAGGAGAGACAATTATGTTCCTGAGGTCTCAGCCTT 

GGATCAGGAGATTATTGAAGTAGATCCTGACACTAAGGAAATGCTGAAGCTTTTGG 
ACTTCGGCAGTCTGTCCAACCTTCAGGTCACTCAGCCTACAGTTGGGATGAATTTC 
AAAACGCCTCGGGGACCTGTTTGAATTTTTTCTGTAGTGCTGTATTATTTTCAATA 
AATCTGGGACAA 

25 

Sequence ID 1198 

CAGAGGTGGGAGGATTGCTTCAGTTCAAGAGTTTGAGACCAGCCTGGGTAACATGG 
CGAAACCCTGTCTTTACAAAAAATGCAAACCTTTGCCGCATGTGTTGGGGTGCGCC 
TGTAGTCCCAGCTTCTCGGGAGGCTGAGGTGGGGGGACCACCTGAGCCATGGAGGT 

3 0 TGAGGCTGCAGTGAGCCGTGATACCACCACTGTACTCTAGCCTGGGCCATAGAGTG 

AGACACCCTGCCTCAGAAATA 

Sequence ID - 1199 nt: 439 

CCC^TCCCCTCGACCGCTCGCGTCGCATTTGGCCGCCTCCCTACCGCTCCAAGCCC 
3 5 AGCCCTCAGCCATGGCATGCCCCCTGGATCAGGCCATTGGCCTCCTCGTGGCCATC 
TTCCACAAGTACTCCGGCAGGGAGGGTGACAAGCACACCCTGAGCAAGAAGGAGCT 
GAAGGAGCTGATCCAGAAGGAGCTCACCATTGGCTCGAAGCTGCAGGATGCTGAAA 
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TTGCAAGGCTGATGGAAGACTTGGACCGGAACAAGGACCAGGAGGTGAACTTCCAG 
GAGTATGTCACCTTCCTGGGGGCCTTGGCTTTGATCTACAATGAAGCCCTCAAGGG 
CTGAAAATAAATAGGGAAGATGGAGACACCCTCTGGGGGTCCTCTCTGAGTCAAAT 
CCAGTGGTGGGTAATTGTACAATAAATTTTTTTTTGGTCAAATTTAA 

Sequence ID - 1200 nt: 526 

CTGGAGACGACGTGCAGAAATGGCACCTCGAAAGGGGAAGGAAAAGAAGGAAGAAC 
AGGTCATCAGCCTCGGACCTCAGGTGGCTGAAGGAGAGAATGTATTTGGTGTCTGC 
CATATCTTTGCATCCTTCAATGACACTTTTGTCCATGTCACTGATCTTTCTGGCAA 
GGAAACCATCTGCCGTGTGACTGGTGGGATGAAGGTAAAGGCAGACCGAGATGAAT 
CCTCACCATATGCTGCTATGTTGGCTGCCCAGGATGTGGCCCAGAGGTGCAAGGAG 
CTGGGTATCACCGCCCTACAC1ATCAAACTCCGGGCCACAGGAGGAAATAGGACCAA 
GACCCCTGGACCTGGGGCCCAGTCGGCCCTCANAGCCCTTGCCCGCTCGGGTATGA 
AGATCGGGCGGATTGAGGATGTCACCCCCATCCCCTCTGACAGCACTCGCAGGAAG 
GGGGGTCGCCGTGGTCGCCGTCTGTGAACAAGATTCCTCAAAATATTTTCTGTTAA 
TAAATTGCCTTCATGTAAACTG 

Sequence ID - 1201 nt: 613 

CTTAAGTATGCCCTGACAGGAGNATGAAGTAAAGAAGATTTGCATGCAGCGGTTCA 
TTAAAATCGATGGCAAGGTCCGAACTGATATAACCTACCCTGCTGGATTCATGGAT 
GTCATCAGCATTGACAAGACGGGAGAGAATTTCCGTCTGATCTATGACACCAAGGG 
TCGCTTTGCTGTACATCGTATTACACCTGAGGAGGCCAAGTACAAGTTGTGCAAAG 
TGAGAAAGATCTTTGTGGGCACAAAAGGAATCCCTCATCTGGTGACTCATGATGCC 
CGCACCATCCGCTACCCCGATCCCCTCATCAAGGTGAATGATACCATTCAGATTGA 
TTTAGAGACTGGCAAGATTACTGATTTCATCAAGTTCGACACTGGTAACCTGTGTA 
TGGTGACTGGAGGTGCTAACCTAGGAAGAATTGGTGTGATCACCAACAGAGAGAGG 
CACCCTGGATCTTTTGACGTGGTTCACGTGAAAGATGCCAATGGCAACAGCTTTGC 
CACTCGACTTTCCAACATTTTTGTTATTGGCAAGGGCAACAAACCATGGATTTCTC 
TTCCCCGAGGAAAGGGTATCCGCCTCACCATTGCTGAAGAGAGAGACAAAAGA 

0 

Sequence ID 1202 

GGAATTCGCGGCCGCGTCGACCTCTGCTCGAATTGACAGAAAAGGATTCTGTGAAG 
AGTGATGAGATTTCCATCCATGCTGACTTTGAGAATACATGTTCCCGAATTGTGGT 
CCCC^AAGCTGCCATTGTGGCCCGCCACACTTACCTTGCCAATGGCCAGACCAAGG 
5 TGCTGACTCAGAAGTTGTCATCAGTCAGAGGCAATCATATTATCTCAGGGACATGC 
GCATCATGGCGTGGCAAGAGCCTTCGGGTTCAGAAGATCAGGCCTTCTATCCTGGG 
CTGCAACATCCTTCGAGTTGAATATTCCTTACTGATCTATGTTAGCGTT'CCTGGAT 
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CCAAGAAGGTCATCCTTGACCTGCCCCTGGTAATTGGCAGCAGATCAGGTCTAAGC 
AGCAGAACATCCAGCATGGCCAGCCGAACCAGCTCTGAGATGAGTTGGGTAGATCT 
GAACATCCCTGATACCCCAGAAGCTCCTCCCTGCTATATGGATGTCATTCCTGAAG 
ATCACCGATTGGAGAGCCCAACCACTCCTCTGCTAGATGACATGGATGGCTCTCAA 
5 GACAGCCCTATCTTTATGTATGCCCCTGAGTTCAAGTTCATGCCACCACCGACTTA 
TACTGAGGTGGATCCCTGCATCCTCAACAACAATGTGCAGTGAGCAT 

Sequence ID - 1203 nt : 692 

TGCAGAGGGGTCCATACGGCGTTGTTCTGGATTCCCGTCGTAACTTAAAGGGAAAC 

10 TTTCACAATGTCCGGAGCCCTTGATGTCCTGCAAATGAAGGAGGAGGATGTCCTTA 
AGTTCCTTGCAGCAfeGAACCCACTTAGGTGGCACCAATCTTGACTTCCAGATGGAA 
CAGTACATCTATAAAIiGGAAAAGTGATGGCATCTATATCATAAATCTCAAGAGGAC 
CTGGGAGAAGCTTCTGCTGGCAGdTCGTGCAATTGTTGCCATTGAAAACCCTGCTG 
ATGTCAGTGTTATATGCTGCAGGAATACTGGCCAGAGGGCTGTGCTGAAGTTTGCT 

15 •GCTGCCACTGGAGCCAGTCGAATTGCTGGCCGCTTCACTCCTGGAACCTTCACTAA 9 
CCAGATCCAGGCAGCCTTCCGGGAGCCACGGCTTCTTGTGGTTACTGACCCCAGGG 
CTGACCACCAGCCTCTCACGGAGGCATCTTATGTTAACCTACCTACCATTGCGCTG 
TGTAACACAGATTCTCCTCTGCGCTATGTGGACATTGCCATCCCATGCAACAACAA 
GGGAGCTCACTCAGTGGGTTTAATGTGGTGGATGCTGGCTCGGGAAGTTCTGCGCA 

2 0 TGCGTGGCACCATTTCCCGTGAACACCCATGGGAGGTCATGCCTGATCTGTACTTC 
' TACAGAGATCCTGAAGAGAT 

Sequence ID 1204 

TTTTTTTTTTTTTCCTGCGGGAAAGCGCGCCATTGTGTTGGTACCCGGGAAATTCG 

2 5 CGGCCGCGTCGACACAGGCCCCAGCATCAAGATCTGGGATTTAGAGAGGAAAGATC 

ATTGTAGATGAACTGAAGCAAGAAGTTATCAGTACCAGCAGCAAGGCAGAACCACC 
CCAGTGCACCTCCCTGGCCTGGTCTGCTGATGACACAGGTTGGGCNGGNNCNCNGG 

GGNGGNNITOGNlSn^^ 

G^cisnsnsnsn^^ 

3 0 NNTNNNNGGGTNCNNNCNCNNNGGCGCGC 

Sequence ID 1205 

CAGACTCTGACCCAGCCTCAGTCCTAACTCCTGGGGCTGGGCTGAGGGGAACAAGC 
ATTTGCTGAAACTTGAAAAAACAAAGCAAATCAAAAACAGGAAAAAATTGTACCTG 
3 5 gtacTTTTTTTTAGAAAAAAAGATTAAAAAAGAAAGAATAAATTCTTGTTTGGAAA 
CTTGAAAA&AAAAAAAA&AAAAAAAAAAA&AAAAAAAAAAAAAAAAAAA 
AAAAAATTTTAAACTC 
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TlSnSFNNNTNNOSJNC^ 
ACN 

Sequence ID - 1207 nt : 642 

5 A CGAGAAGCCAGATACTAAAGAGAAGAANCCCGAAGCCAAGAAGGTTGATGCTGGT 
GGCAAGGTGAAAAAGGGTAACCTCAAAGCTAAAAAGCCCAAGAAGGGGAAGCCCCA 
TTGCAGCCGCAACCCTGTCCTTGTCAGAGGAATTGGCAGGTATTCCCGATCTGCCA 
TGTATTCCANAAAGGCCATGTACAAGAGGAAGTACTCAGCCGCTAAATCCAAGGTT 
GAAAAGAAAAAGAAGGAGAAGGTTCTCGCAACTGtTACAAAACCAGTTGGTGGTGA 
10 CAAGAACGGCGGTACCCGGGTGGTTAAACTTCGCAAAATGCCTAGATATTATCCTA 

ctgaagatgtgcct,Ggaaagctgttgagccacggcaaaaaacccttcagtcagcac 

GTGAGAAAACTGCGAGCCAGCATTACCCCCGGGACCATTGTGATCATCCTCACTGG 
ACGCCACAGGGGCAAGAGGGTGGTTTTCCTGAAGCAGCTGGCTAGTGGCTTATTAC 
. TTGTGACTGGACCTCTGGTCCTCAATCGAGTTCCTCTACGAAGAACACACCAGAAA 
15 TTTGTCATTGCC^CTTCAACCAAAATCGATATCAGCAATGTAAAAATCCCAAAACA 
TCTTACTGATGCTTACTTCAAAAAGA «$ 

Sequence ID 1208 

CCCTATACCTTCTGCATAATGAATTANCTAGAAATAACTTTGCAAGGGAGAGCCAA 
2 0 AGCTAAGACCCCCGAAACCAGACGAGCTACCTAAGAACAGCTAAAAGAGCACACCC 
- GTCTATGTAGCAAAATAGTGGGAAGATTTATAGGTAGAGGCGACAAACCTACCGAG 
CCTGG1GATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACTTTAAATTTGCCCA 
' CAGAACCCTCTAAATCCCCTTGTAAATTTAACTGTTAGTCCAAAGAGGAACAGCTC 
TTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAAAATTTAACACCCATAGTA 

2 5 GGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACACCCACTACCTAA 

AAAATCCCAAACATATAACTGAACTCCTCACACCCAATTGGACCAATCTATCACCC 
TATAGAAGAACTAATGTTAGTATAAGTAACATGAAAACATTCTCCTCCGCATAAG 

Sequence ID - 1209 nt : 620 

3 0 CTCTCCTGTCAACAGCGGCCAGCCTCCCAACTACGAGAATGCTGAAGGAGGAGCAG 

GAAGTGGCTATGCTGGGGGCGCCCCACAACCCTGCTCCCCCGACGTCCACCGTGAT 
CCACATCCGCAGCGAGACCTCCGTGCCCGACCATGTCGTCTGGTCCCTGTTCAACA 
CCCTCTTCATGAACACCTGCTGCCTGGGCTTCATAGCATTCGGCTACTCCGTGAAG 
TCTAGGGACAGGAAGATGGTTGGCGACGTGACCGGGGCCCAGGCCTATGCCTCCAC 
3 5 CGCCAAGTGCCTGAACATCTGGGCCCTGATTTTGGGCATCTTCATGACCATTCTGC 
TCGTCATCATCCCAGTGTTGGTCGTCCAGGCCCAGCGATAGATCAGGAGGCATCAT 
TGAGGCCAGGAGCTCTGCCCGTGACCTGTATCCCACGTACTCTATCTTCCATTCCT 
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CGCCCTGCCCCCAGAGGCCAGGAGCTCTGCCCTTGACCTGTATTCCACTTACTCCA 
CCTTCCATTCCTCGCCCTGTCCCCACAGCCGAGTCCTGCATCAGCCCTTTATCCTC 
ACACGCTTTTCTACAATGGCATTCAATAAAGTGTATATGTTTCTGGTGCTGCTGTG 

ACTT 

5 

Sequence ID 1210 

TTCGTAATTAGAATACTGTTTGGACTTGCTCAACAAGCACCTTATCTTAACAAAAA 
GTAACTTATAGAAAAGGGAGACATTCATTTAACTTCAAGCCCATATTATTCTTAAA 
AGCTGACTCTTGAAATAGTATTTATTGAGTCATAGTGGAGTCATGGGACTTTTTAA 
10 GGGCCGGAAGGGAGTATTTAGATCATCCAGTCCCAGCCTGTCATTTTATGGAGGAG ^ 

gaaactgaggcctagataagataaccagttagtgggtccactgacctttaggagag 

TAGTCTATCCGTAAGAGACAACATGGAGAAAGAAATACAACGTTTTTATAGTGAAT 
TATCATCTTACAAAGAATATTCTTCCCATATCGCACTTTTAAAAAGTGGGTACCTT 

agtcaaataggagaaaaaaccacttgagtagtttcatcctcaggt'tttaggtgagg 
15 aaactgatactcagattaaataactttaagcacacagagcctgaatgatagtctta 

TTTGAGCTCATCTGTGCTTTTAATGTGTACTACGTTAGGTGTTTTCACTTGCATTT 
CCTTTAGTCTTATTTGAGCTCATCTGTGCTTTTAATGTGTACTACGTTAGGTGTTT 
TCACTTGCATTTCCTTGTTTGACGTTGACAATAAATCGTGAAGCTGCCTTATCTAA 

GGAAGTCCTAAAGTAAATCATTGGAACACA 
Sequence ID 1211 

CCATTGTGTTGGNACCCGGGAATTCGCGGCCGCGTCGACGGAGTTTTACCTTATTA 
CACTTTAATCTCTGGATTTACCCCATCTCATTTCTCTTTTAGGAAAACTGTTTGTA 
TGTGGTGGCTTTGATGGTTCTCATGCCATCAGTTGTGTGGAAATGTATGATCCAAC 

2 5 TAGAAATGAATGGAAGATGATGGGAAATATGACTTCACCAAGGAGCAATGCTGGGA 

TTGCAACTGTAGGGAACACCATTTATGCAGTGGGAGGATTCGATGGCAATGAATTT 
CTGAATACGGTGGAAGTCTATAACCTTGAGTCAAATGAATGGAGCCCCTATACAAA 
GATTTTCCAGTTTTAACAAATTTAAGACCCTCTCAAACTAACAGGCTTAGTGATGT 
AATTATGGTTAGCAGAGGTACACTTGTGAATAAAGAGGGTGGGTGGGTATAGATGT 

3 0 TGCTAACAGCAACAC^AAGCTTTTGCATATTGCATACTATTAAACATGCTGTACAT 

ACTTTTTGGGTTTATTTGGAAAGGAATGCAAAGATGAAGGTCTGTTTTGTGTACTT 
TTAAGACTTTGGTTATTTTACTTTTTGGAAAAGAATAAACCAAGAATTGATTGGGC 

ACATCATTTCAAGAAG 



35 

Sequence ID - 1212 nt: 374 

AGAGCAGCAGCCATGGCCCTACGCTACCCTATGGCCGTGGGCCTCAACAAGGGCCA 
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CAAAGTGACCAAGAACGTGAGCAAGCCCAGGCACAGCCGACGCCGCGGGCGTCTGA 
CCAAACACACCAAGTTCGTGCGGGACATGATTCGGGAGGTGTGTGGCTTTGCCCCG 
TACGAGCGGCGCGCGATGGAGTTACTGAAGGTCTCCAAGGACAAACGGGCCCTCAA 
ATTTATCAAGAAAAGGGTGGGGACGCACATCCGCGCCAAGAGGAAGCGGGAGGAGC 
5 tgagcAACGTACTGGCCGCCATGAGGAAAGCTGCTGCCAAGAAAGACTGAGCCCCT 
CCCCTGCCCTCTCCCTGAAATAAAGAACAGCTTGACAG 

Sequence ID - 1213 nt : 567 

GAATTATTGACTTTGAATTGCATTTCAGTACCATGAAGTCAAAGTCAGTGGTGTAT 

10 TTGCTCATTTGTTCATTCTTTCTTTTCGACCAACATTACTGCCTGCAGAGCCAGAG 
GTGAGTGCAGAAATpCTGTCAATTCGTC^CTTGTGGACAACCTGCAGCTTGCCACA 
GCCTACAGTTCCACCACTGTGACCTCTGAAAACCTCCTGAACAAAAGGAAGGAGAC 
■ TTGGAAATCCTGAATGGGCTTGGAGACATTAAGGGAGAACTGCCTCCCTGGACCAA 
GGCAGAATTCAATAGAACCAGCAAGAAATTTTCCTATGAATGGGAAAGCAGGTGGC 

15 AGGGGGCAGGGGTGGAAAAGCTTTGTACAGGA^TTGTGGAAAAGCTTTTGCATTAT 
CTCTAGTCTGAAAGTCACATTTCTGjfeTTCC^TCCACTCTCTTCTGTCAACTTGC 
TGTGAGTAAATGACATCTGTCACCTGTGACACGGGCCAGGGACTATCACCATATGG. 
CCCCCACACATTATCTAGTACCAGCCTGCCTGGGCCATGCCTTTTCCAGTCACTGT 

ACCAGCC 

20 

■ Sequence ID - 1214 nt : 620 

• • CTCTCCTGTCAACAGCGGCCAGCCTCCCAACTACGAGAATGCTCAAGGAGGAGGAG 
GAAGTGGCTATGCTGGGGGCGCCeCACAACCCTGCTCCCCCGACGTCCACCGTGAT 
CCACATCCGCAGCGAGACCTCCGTGCCCGACCATGTCGTCTGGTCCCTGTTCAACA 
' 25 CCCTCTTCATGAACACCTGCTGCCTGGGCTTCATAGCATTCGCCTACTCCGTGAAG 
7 . ^ TCTAGGGACAGGAAGATGGTTGGCGACGTGACCGGGGCCCAGGCCTATGCCTCCAC 
CGCCAAGTGCCTGAACATCTGGGCCCTGATTTTGGGCATCTTCATGACGATTCTGC 
TCGTCATCATCCCAGTGTTGGTCGTCCAGGCCCAGCGATAGATCAGGAGGCATCAT 
TGAGGCCAGGAGCTCTGCCCGTGACCTGTATCCCACGTACTCTATCTTCCATTCCT 
3 0 CGCCCTGCCCCCAGAGGCCAGGAGCTCTGCCCTTGACCtGTATTCCACTTACTCCA 
CCTTCCATTCCTCGCCCTGTCCCCACAGCCGAGTCCTGCATCAGCCCTTTATCCTC 
ACACGCTTTTCTACAATGGCATTCAATAAAGTGTATATGTTTCTGGTGCTGCTGTG 

ACTT 



35 



Sequence ID 1215 

CAC^GATAGAATGGTAAAAAA^AAAAAAAAAAAAAAAAAAAAAAAAAAATTTTA^ 
GTGACAGTGCCATAGTTTGGACAGTACCTTTCAATGATTAATTTTAATAGCCTGTG 
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AGTCCAAGTAAATGATCACTTTATTTGCTAGGGAGGGAAGTCCTAGGGTGGTTTCA 
GTTTCTCCCAGACATACCTAAATTTTTACATCAATCCTTTTAAAGAAAATCTGTAT 
TTCAAAGAATCTTTCTCTGCAGTAAATCTCGCAGGGGAATTTGCACTATTACACTT 
GAAAGTTGTTATTGTTAACCTTTTCGGCAGCTTTTAATAGGAAAGTTAAACGTTTT 
5 aaacATGGTAGTACTGGAAATTTTACAAGACTTTTACCTAGCACTTAAATATGTAT 
AAATGTACATAAAGACAAACTAGTAAGCATGACCTGGGGAAATGGTCAGACCTTGT 
ATTGTGTTTTTGGCCTTGAAAGTAGCAAGTGACCAGAATCTGCCATGGCAACAGGC 
TTTAAAAAAGACCCTTAAAAAGACACTGTCTCAACTGTGGTGTTAGCACCAGCCAG 
CTCTCTGTACATTTGCTAGCTTGTAGTTTTCTAAGACTGAGTAAACTTCTTATTTT 
.10 TAGAAAGTGGAGGTCTGGTTTGTAACTTTCCTTGTACTTAATTGGGTAAAAGT 

Sequence ID - 1216 nt : 484 

CAACCTtAGCCAAACCATTTACCCAAATAAAGTATAGGCGATAGAAATTGAAACCT 

GGCGCAATAGATATAGTACCGCA2VGGGAAAGATGAAAAATTATAACCAAGCATAAT 
15 ATAGCAAGGACTAACCCCTATACCTTCTGCATAATGAATTAACTAGAAATAACTTT 
5* GCAAGGAGAGCCAAAGCTAAGACCCCCGAAACCAGACGAGCTACCTAAGAACAGCT 
AAAAGAGCACACCCGTCTATGTAGCAAAATAGTGGGAAGATTTATAGGTAGAGGCG 
ACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAAC 
TTTAAATTTGCCCACAGAACCCTCTAAATCCCCTTGTAAATTTAACTGTTAGTCCA 
2 0 AAGAGGAACAGCTCTTTGGACACTAGGAAAAAACCTTGTAGAGAGAGTAAAAAATT 
TAACACCCATAGTAGGCCTAAAAGCAGCCACCAATT 

Sequence ID 1217 

GACAGGCGGGGGCCCAGCGGCCGGGTGAAGGCCGGGTGGCTCTGTGAATCAAAGGA 

2 5 GAGTCCCAGAAAACCTGTGACTGTTGAAGAAAATTCATCTGTGAATTTTTATATTC 

AAGGAGTCAGTATTTATATTCATCTTTTAAACTGGGAAGATTTATATTTTACTTTA 
AAACTTCTTGATAATAATTTACAATGAATGGACACAGTGATGAAGAAAGTGTTAGA 
AACAGTAGTGGAGAATCAAGGTAAGTAAGCACTTTGTTATCAATTGTTTACTATGA 
AGAGAGTTGAAAACTTGACTTTTTTCTTTATTGTTATTGTTGTTATTTAGTTTTCC 

3 0 TCATAGGTAGCAGAGTTTTCAGGTTTTCCTCTTAGCTATCCAAATACTAAAAAAAT 

TCTGATATACGAACCTTTTTTCATAATACAGGTTTTAATTATATTTTTCATTCAGA 
TACACAGTAGATCTTAAATATAGAAAGTTTTTGTTTACTTAAATCTATTTGGAAGT 
TTATATTTGAGCTAATAATTAAGCTGGAGCATGTATAATAGATTTAAATTGTTTTG 

ACTGTTAGTGAAATTT 

35 

Sequence ID 1218 

CTCACTTGGTGGGTGAGCCTCCAATGACTACACCCAAGGAGGATTTAACACAGGGA 
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TTTTATGACTTGCAACAAGTCAGGAGGACATGGGGTTGGGGTAGTTCAGCAGTGCC 
TGTCTGAACAAAGGTGAAAATTGGGCTTTTATTGGGCTGATCAAGGGGGAGTAAAG 
GCAGCCAGGAGCAGTCGCCTGTCATGCTTCTACCTATATTGCATGTATAGAAAAGG 
GAAAATAAACTCCTTCCTGGGCAGGGTTTTAGTATGCTAAGGAGGGGAGTTATTCA 
5 ACTTCAATCCAACTCAAGCATCAGCATTGCTGCGTCCATCCCAGTTTTGTTTTGCT 
GGGGCTGAACTTCTTCCTATAACTTTTTGAAACAACAAGAACTCAAGGTGTGACAG 
TTACAAGTGGGCCCTTTTTCACAGTGTGTACCTAAACACGTGAGGACCCTGGATTA 
CAGAATGACAGACTCGAAGTGACTCAAGTTCCGGTTGTTCATCTTTAGATGGTAAA 
GATGGCTGTACGTACTATCCTTGCTTATTTCCAATCTATTGTTTAAACTCTTGTAT 

10 atgtaataccgcagaggctagagatacaacctttgaccaaatgagtgaattcaagt . 
aatccattactaatjgtgatctggAaacaaacatggtgttgaatgtgcatatgt 

Sequence ID - 1219 nt: 559 * 

CTTGGCAGCTCCGTTATGTGCCCAGCTCTTTGCAAGGGCATACTGGGAAATGAGTG 
1 5 GAGATAAAGGACCCAATCATAAGCATTTTACAGTATGGATACCCCATTTTAAAAAG 



CAGTGATTCATGAATCAGGCAGCACCAAACCAGAAGGAGGCTTTGCTGAANAAGGA 
TGAGGGACAAGCATTTATAAAGTGAATGTAGATGTAATACAAAGAAAATATTTGAA 
CCGGGTGCGGTGGCTTACACTTGTAATCCCAACACTTTGGGAGGCCAAGGCGGGCA 

2 0 GATCACAAGATCAAGAGATCGAGACCATCCTGGTCAACATGGTGAAACCCCATCTN 

• tACTAAAAAATACAAAAATTANCTGGGCGTGGTGGTGCGTGCCTGTAGTCCCAGCT 
• ACTTGGGCGGCTGAGGCAGGANAATTGCTTGAACCCGGGAGGTGGAGGTTGCAGTA 
' AGCCGAGATTGCACCATTGCACTACTCCAGCCTGGTGACAGAGAGAGACTCCATC 

25 Sequence ID 1220 

GANl^GTGCGATANNATGNNTGTCTTTTTTTTAAAGTNTTTCNNATNGNAGNGAAN 

CCCCCNNANNTNNCATAANGAGAGATNACTACNGTACANATAGNGNCANACNGATA 

GTAGTANCAANATTGTNTTAGCTANATNANTCAATAGATATCNAGATANAANAANA 

NCI^GGATATACAGCGATGTNTNAITOGGNIINNNI^ANGGAACGAACATCNACNTTA 

3 0 ANNATAAGCTNGNGGAGAGAGACANGTANGTTATANANNAGAATNGNAGTAGGNGT 

GATC!ATAATAGNNNNNANNTANTATATANGATNTTANTGNNCTNTNNTNNGTTTAT 
CKTCINAATNTCTATNCTNGAGAG^^^ 

CTCNTNATAGANANCTGGTGTCmANAANTACNTCATCTATTNANCTCTCACNANA 
TGGNANNATANAGmGNGNNNTNNANAGGANTANGCATAGNGNNTNNCTNAAACAA 

3 5 AANNNATAAGAITOTTCTCGimAANANGG^ 

ATANTTNTTCNCTCTTl^AATANlTOANGATANATGANCTNGl^GTGATANATANNN 
NlTrACNGTNAANNTNTANTCNTATAATAGATANAAATATAGGATNTTNCTCTGGCN 
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GGTNGAANANTTNNTNOJNTT^ 

NNTTAGAAAGGTACTCTATATACTNNTATGNTNCGGCNNATAA.TANAACAGATGTT 
TGTATNAATATNAAANAAGGTCNNTTTCGNCAAGAGAANNNTGNCTGGTNATAGAA 
TTAGCATAANTTANNTANTATGATNNANTNNTNCTACNANTNTTAGCNNTTNGCAG 

NAGTCATT^GNATNTATlWNGlSnSITA^ 

TNTGNGAANATGAANNTACGNANTCCTNNGNANTATNATNNTGANTANGANAANCN 
ANANNTNTTNTANNANTGNCTATANATTGCCNNGATANATTNTNNNAATGAANCGA 
TAGCCCGCNCTAAGGAIWTIWGTNANNTAAAN^ 
TATTAANCTSIAlNnSrATCACANTAT^ 

TATNACNGNTCCNIWCCGNGAANlTrANTCNTANNAGGCATTCNGNNGAGCTNTTCT 

NCTAGACNATTTNNANTGAAANNATGCNGNNAAAAACGAC2!lNNCTTNAANTTNTGT 

CTACANTCCGCNNTNTTTNTACAGATNGCAGl^AAGl^ 

NGCTNNNACT 

Sequence ID - 1221 nt: - 741 

AAGCAGAANTNTCTCTAAAAACATTATCTCCTTAAAATCTTGAGGTGCATATNAGA 

GCCACAGGCAATCTCTGACATATAAAATTGCAGTACAGGCCTTTCAAATTTGGCAT 

TTCACTGGTACAATACAACAACCAAGATATATAATAACTGTACAGTGCCTAGACAT 

TCCAGTAAGAACCATTATTTTCTTTAATGTAGAATGATTAATACATATTCTACAAG 

0 GGGCAGTAAGGTTAGTAATTCTATAGGGTATGTCCCGACATAATTTTCAAATTGTA 
CAATAACACAAACAACTTTGTTAAGGCCATGTTTTATTTGCTGATTAATGGACAAA 
AGGCAATGTAATTTATTTTCAAGTATTTTCTTGAAAGTCTGTGCTCATAAAAATCA 
TGAAAAGTTGGAAAGACTGTTAAATCACTGAAACTTCAAATATATCTTACACAATC 
TTGTTTGTACAAAAATACAAGTTAAATATAAACATAAAGCAATCATGGTAATTTTA 

5 TGCAAATCTGTTTTATGTGATCATCAGTTATATATAAAAGTTTCTCAGTTCTGTTA 
TTTGTGAAAAGATCAATACCAGATTGAATGACTACCTATTGGCAAAGGGCCCTAAA 
AAGCTTACTTTAGCACTCATCTTTTACATGGTTAAATGCATTTCCTAATTTGAGAT 
CACCTAAACACTGGAAAAGAAAAAAAATGAAAGGGCAGTATGTCCATAAACCAACA 

AATAATTTGGCTG 

0 

Sequence ID - 1224 nt: 485 

CGAAATTTCCTTGTGACACAGAGGAAGGGCAAAGGTCTGAGCCCAGAGTTGACGGA 
GGGAGTATTTCAGGGTTCACTTCAGGGGCTCCCAAAGCGACAAGATCGTTAGGGAG 
AGAGGCCCAGGGTGGGGACTGGGAATTTAAGGAGAGCTGGGAACGGATCCCTTAGG 
5 TTCAGGAAGCTTCTGTGCAAGCTGCGAGGATGGCTTGGGCCGAAGGGTTGCTCTGC 
CCGCCGCGCTAGCTGTGAGCTGAGCAAAGCCCTGGGCTCACAGCACCCCAAAAGCC 
TGTGGCTTCAGTCCTGCGTCTGCACCACACAATCAAAAGGATCGTTTTGTTTTGTT 
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TTTAAAGAAAGGTGAGATTGGCTTGGTTCTTCATGAGCACATTTGATATAGCTCTT 
TTTCTGTTTTTCCTTGCTCATTTCGTTTTGGGGAAGAAATCTGTACTGTATTGGGA 
TTGTAAAGAACATCTCTGCACTCAGACAGTTTACAGA 

5 Sequence ID 1226 

GGTTTTTATACTTGCCATGAAACTGTTCTTTGGGATATTATTTTGTTCAGGTTCCC 
CACTTGGACAGCAGAGGGGGTGACTCTGCCCATCCCTGCCACTGGTAGCCAGGCGG 
GCAATGTCTGCTAGCAGTCTGCTTCTGTCTGAACTCAGCCAGCAGAGGCAAACTCC 
CGGTTCCCCGAGAAACACTCTGAAGGCAGGGTGGGTGACTCCACCCACCACCGCCT 

1 0 CTCCTAGCCATGCAGGCCATGTCTGCTAGAGCTTCGAGCGCAGTGGTCCTAATTCT 
GTCTGAATCCGGCTGAGGGGTGCAGCCTCCTGTTACTGCCCAGGGAAACACCCAGA 
TGGCAGGGTGGGTGACTCCAACCACCTCTGCCTGTGGTAGCCAGATGGGCCACACC 
TGCTAGAGCTTCCAGCCCAGCAGTCCCGCTACTCTGTGGGTGGGTGCCATCCCCTG 
TTCCTCTGGGAAGCACCCAGACAGCTGATTACGTGACCCCACCCACTTCTGCAGAT 

.15 CCTAGCTGAGCAGGACTTGCTGGTTTGGACAATGCCCAAGCAGGGAAGAGCCCTCA 
TTCTCTTATCA.CTGACAGAGGTGAGATGTCCGANTTTGTANGCTGGTGGAGGAGTG 
AGGTGGAGGAGGTATGCCTCT 

Sequence ID 1228 
2 0 GTTATTCAGGTATCCATCAAAATTTTATAAGAGGGCCGGAAACATCGGCTCACACC 
' TGTAATCCCAGCACTTTGGGAGGCTGAGGCAGGTGGTTCACTTGAGGTCAGGAGTT 
CGAGACCAGCCTGGCCAACATGGCAAAACCCCGTCACTATTAAAAATACAAAAGAT 
* TAGCTGGGTGTAGTGGCAGGTGCCTGTAATCCCAGCTATTCGGGAGGCCTAGGAAG 
GAAAATGGCTTGAACCTGGGGGTGGAGGTTGGAGTGAGGCAAGATCACACCACTGC 

2 5 ACTCCAGCCTGGGCGACAGAGCGAGACTCCATC 

CAAAAAAACCTTTATCAGATTATCAGAGGTTATCACTACAGAGGGAGGTAAAATTG 
GAGGGAAAAGGGTACAAATTTATTTCAC 

Sequence ID - 1230 nt : 741 

3 0 j^GCAGAANTNTCTCTAAAAACATTAT 

GCCACAGGCAATCTCTGACATATAAAATTGCAGTACAGGCCTTTCAAATTTGG 
TTCACTGGTACAATACAACAACCAAGATATATAATAACTGTACAGTGCCTAGACAT 
TCCAGTAAGAACCATTATTTTCTTTAATGTAGAATGATTAATACATATTCTACAAG 
GGGCAGTAAGGTTAGTAATTCTATAGGGTATGTCCCGACATAATTTTCAAATTGTA 
3 5 CAATAACACAAACAACTTTGTTAAGGCCATGTTTTATTTGCTGATTAATGGACAAA 
AGGCAATGTAATTTATTTTCAAGTATTTTCTTGAAAGTCTGTGCTCATAAAAATCA 
TGAAAAGTTGGAAAGACTGTTAAATCACTGAAACTTCAAATATATCTTACACAATC 
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TTGTTTGTACAAAAATACAAGTTAAATATAAACATAAAGCAATCATGGTAATTTTA 
TGCAAATCTGTTTTATGTGATCATCAGTTATATATAAAAGTTTCTCAGTTCTGTTA 
TTTGTGAAAAGATCAATACCAGATTGAATGACTACCTATTGGCAAAGGGCCCTAAA 
AAGCTTACTTTAGCACTCATCTTTTACATGGTTAAATGCATTTCCTAATTTGAGAT 
5 CACCTAAACACTGGAAAAGAAAAAAAATGAAAGGGCAGTATGTCCATAAACCAACA 

AATAATTTGGCTG 

Sequence^ID - 1231 ym nt: 203 

" * TTGAGGAAGGGTCTACTGTCTTTTTAAATGGCACAATTTTAAGAGGTTTGAGAGGT 
10 ACAGTCCCTTAACCTGCCACGGGAGAGGGGCCCCCAAACTTTCTTCCCCCCACACT 
TCTGGTTTTCTGTGTGGAGGGGGAGCAGGGATATCTAAGCTGTGGTGTGAAAGGGT 
AGGAGAGATGCTGGAGGTGGGGGTGCTGTGTTCTA 

Sequence ID 1239 . 
15 TTTCCTCGGGAAGCGCGCCATTGTGTTGGT^CCCGGGAATTCGCGGCCGCGTCGAC 

3TTTTATTTATC 



TACAZ^AATATAGCAATACAGNGAACTTCACCAAATCCTAAATATTCAGTACCTGA 
ACTGGCTACAACACCGNGTGCACACCCAGTTCCTGCAGAATCTCTTGCAGATATGG 
GAGAGTCAGCCAGTGAAAAGATCCATTTCTTGGGAATCCTTGTCAACAAGACCAGT 

2 0 TCAGAAATCCAGGATATATAGAAGCCTACTGTAATTTAAAAACAGTAACAAAAACC 
CCAACAAAACCCAAATCAACAAAGACCAAGATAAAGGNGTGATAAACATTAATTGT 
AATGGTTTTCCTTTACATGCAATACATGCATTTTAAAATCACTAAGAAACACGAAA 
TTTTGTAGAGCAAAGTTTGNGTTTCACGTAAGTGCAAATGAATATATATTTTATTT 
TTTATACTATTAAATTATATATATTTTTTCCATACAAAAGCACACAGTGTTAATCT 

2 5 ATAAAATGACATCCAAGTGGATGATGATTGTTTTTGCATGTCCCCCTGCTTAGATT 
TTTTTAAAATATATAGTCAAAAATTAACATCCTTCTTTAAAAATACAGAAGGGAAA 
AANGGGCAAAAAAAAAAATCTAGACTCGAGCAAGCTTATGCATGCATGCGGCCGCA 
ATTCGANCTCGGNCGACTTGGCCAATTCGCCCTATAGNGAGTCGNATTACAATTCA 
CTGGGCCGNCGNTTTACAACGTCGNGACTGGGAAAACCCTGGCGTTACCCNNCTNA 

30 TCGNCTTGNAACAATNCCCNTTTNGCCAGNGGGG 



Sequence ID 1255 

TCACTTCGTATNGAANCTGTTTGGACTTGCTCAACAAGACCTTATCTTAACAAAAA 
GTAACTTATAGAAAAGGGAGACATTCATTTAACTTCAAGCCCATATTATTCTTAAA 
AGCTGACTCTTGAAATAGTATTTATTGAGTCATAGTGGAGTCATGGGACTTTTTAA 
GGGCCGGAAGGGACTATTTAGATCATCCAGTCCCACCCTGTCATTTTATGGAGGAG 
GAAACTGAGGCCTAGATAAGATAACCAGTTAGTGGGTCCACTGACCTTTAGGACAG 
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TAGTCTATCCGTAAGAGACAACATGGAGAAAGAAATAC?^ACGTTTTTATAGTGAAT 
TATCATCTTACAAAGAATATTCTTCCCATATCGCACTTTTAAAAAGTGGGTACCTT 
AGTCAAATAGGAGAAAAAACCACTTGAGTAGTTTCATCCTCAGGTTTTAGGTGAGG 
AAACTGATACTCAGATTAAATAACTTTAAGCACACAGAGCCTGAATGATAGTCTTA 
5 TTTGAGCTCATCTGTGCTTTTAATGTGTACTACGTTAGGTGTTTTCACTTGCATTT 
CCTTTAGTCTTATTTGAGCTCATCTGTGCTTTTAATGTGTACTACGTTAGGTGTTT 
TCACTTGCATTTCCTTGTTTGACGTTGACAATAAATCGTGAAGCTGCCTTATCTAA 
GNAGTCCTAAAGTAAATCATTGGAACACATGTANCCAGTTTGTTGTTTTTATTTGC 
CAGGTNTCAAATATAACTGAAAACCCATGCTAACTGACTNATTTTAAAAGNTGTNT 
1 0 GGGGCATGAAANGATTGCTCTGCCTGGGCGGGNG.QTTNANCCTGNGTCCCCCOT 
NGGAGNCCACCCANGANGCGATA 
NAACCCCNTTTTTAAANANAAAANANCGGNNG 

- Sequence 1256 c . 
15 TTGTGTTGGTACCCGGGAATTCGCGGCCGCGTCGACGGAGTTTTACCTTATTACAC 
TTTAATCTCTGGATTTACCCCATCTCATTTCTCTTTTAGGAAAACTGTTTGTATGT 
GGTGGCTTTGATGGTTCTCATGCCATCAGTTGTGTGGAAATGTATGATCCAACTAG 

aaatgaatggaagatgatgggaaAtatgacttcaccaaggagcaatgctgggattg 
caactgtagggaacaccatttatgcagtgggaggattcgatggcaatgaatttctg 
2 0 aatacggtggaagtctataaccttgagtcaaatgaatggagcccctatacaaagat 
' tttccagttttaacaaatttaagaccctctcaaactaacaggcttagtgatgtaat 
• tatggttagcagaggtacacttgtgaataaagagggtgggtgggtatagatgttgc 
taacagcaacacaaagcttttgcatattgcatactattaaacatgctgtacatact 
ttttgggtttatttggaaaggaatgcaaagatgaaggtctgttttgtgtactttta 

2 5 AGACTTTGGTTATTTTACTTTTTGGAAAAGAATAAACCAAGAATTGATTGGGCACA 

TCATTTCAAGA^GTCCCCTCTCCTCCACATTTGTTTTGCCAATTTGCACATTAAAT 
GACTCTTCCCTCAAATGTGTACTATGGGGTAA?^AGGGGTAGGGNTTAAANATGTAA 
ACAGTTGGGTTTTTTAAGGGNCCTTTTTCATAACTGGAACACTCTNTACAAGGNTN 
CTTNTTAAATAAATAACTTGACTTTTTTGTTTTNTAAANGNANCTTCNTGCTTCCA 

3 o TAAAAAAAAAAATTTAAl^NGNCANCTNTGCTGCTGCGNCCANTTNGCTNGNCCNT 

GGCATTCCCTAGGGANGKTmAfiLTANTGGCNNOT 
NT 

Sequence ID 1331 
3 5 GGGCGATGCATGCTTTATTAAGGCTCTTGTTTCACCTGGCAGTGTACTGTATCAAC 
GTATAATAC^GAAAAAJyyVTCTCTTTAAGGTCCTCCTTCACAAAGACATAGAGTGA 
AACTCCCTTTACATGTCAGTATTTGTTCAACACTTTAGGCAACTTGACTGTCAGTG 
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TTAAAATGGAAAACAGGAAAATGGAAAAATCTGACCAATTCTGCCACCTTGAGACT 

TTCATATAGACCTTGCACAACAATTGTATAGATCACACACCGGCTGTATTTAATAT 

GTAACATTTTCACACATATTAAAGATACAGAAGTATTAAAAAACCCCCAATGTTAA 

TGTATTTGCTTAAAAGGCACAA.GTTTCACATATCTGTCTAGCTATCTGTTGGTAAT 

ACAGAAAGTATACTACTTTTTTAAAAAAGTGGGCAGAATTCTTGTGTATGTATATT . 

TGTGTGTACAGTATGTGTATGTGTGTATATATATATATTATATATATAGATAATAT 

GTGAAGAAATATTTCAAAATGGCCATAAAGGAGGTAAAAATGAAAACCATAACCTA 
ACTTTTATAGAGGCTTTATCTTTAATTTAACGATGTGCGGAGGACTTTCTTGCTTG 
AATCTGTTCCGGGCTGTCTGCTCTGTCCATCAAATGGGCAGGTCTGGGAATGAGGC 
ACCTTC^CCGTTCAGAAGTGGCCTGAACAGAATGCTGGAACCCAGGCTGGACTCG 

GAC 

Sequence ID 1332 «• 

CAAACCTGCATGTTCTGCACATGTATCCAGGAACTTAAAAAAAAAAAAAGATAGTT 
TGTGTGTCTTAATTGAATAATAGTAGATTTATAGATTAAAGATCTATGGGTTTTTA 
ATATGGATTAGAAATCTGTGGGTTTTTGATATGGATTAGAAATCTGTGGGTTTTTA 
ATATGGATTGGAAATCTGTGGGTTTTTAATATGGATTAAAAAACATCTGTGGGTTT 
TTAATATGGATTAAACATCTGTGGGTTTTTAATATGGATTAAACATCTGGGTTTTT 
0 AATATGGATTAAACATCTGTGGGTTTTTAATATGGGTTAAAAATCAAAAGAAAATG 
AACTATTTGCTCCAGTGCAGGAAAATACAGGCAATACTGGATACAATTAGATGGTC 
AGGAGCGATAACCCGGTTGCCATTGTTTGAAGAAGAGAATAAGGTGCTAGCATTCC 
TATCCGTAGATAATTTGACAGCTAGGAAATAGGGGGAGTCTTCTATGTAGTTAGTG 
AAGGCTAAATGAACTATTATATGCAGTTATCGTAGAAGAGTACTC^AAAAAATCTG 
5 TAAAAAATAAAGAAAGGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTG 
GGAGGCCGAGGCGGGTGGATCATGAGGTCAGGAGATCGAGACCATCCTGGCTACCA 

NGGTGAAACCCCCGTCT 

Sequence ID 1335 
0 CAAGACTCCATCTCAAAAAAAAAAAAAAATCTACAGTGCTGAGTATATAAAATTAT 

TAACACATTTCACAACAATATGTGTTTGTGGAGTTAAATATTTTTTGTCTTTAAAA 

CAGGTAATTTTAGTGCATACTTAATTTGATGATTAAATATGGTAGAATTAAGCATT 

TTAAATGTTAATGTTTGTTACATTGTTCAAGAAATAAGTAGAAATATATTCCTTTG 

TTTTTTATTTAAATTTTTGTTCCTCTGTAAACTAAAAGAACACGAAGTAATTGGTC 

5 ACAATTACTGGTGTTTAACTGCCAAATATGGGTAAATAAGGGAAAATTTTGTTTAA 

TATTTAGTCCTTCTGAGATGGCTTGAATATTTGAATTTTGTTGTACGTCTATACTG 

GGTAGTCACAAGTCTTATAAACACTTTAGAGGAAAGATGGATTTCAGTCTGTATTT 
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TTAAACATCATTTATTTTAAATCTGGTGCTGAAAAATAAGAAAAAAATTAAACTGC 
ATTCTGCTGTTCTTCTTTAGAAGCATTCCTGCGTAAATACTGCTGTAATACTGTCA 
TGCAAAGTGTATCCTTTCTTGTCGTATCCTTTTTGGGGCAGTGTTTTTTTGTTTTT 
TTCCTAGAAATGTTTGTCCTTCCCCCACCTGTTGATCCAGGTTAAGGAATACTTTT 
5 TTACACTTTATTCAAA 

Sequence ID 1336 

• CTTTTCCTCCCGCTGTCCCCCACGGAGGGGACTGCTCTCCCGCGGTGCATCCTTTC 
TGTGAGGTACCTTACCCACCTCAGCACCTGAGAGGGTGAAATAGAATTCTAACCTC 

.0 GACATTCGGGAAGTGTTTTTGAGAAGTCTCGGTCGGTAAGGGAAGTCTTCCAAGTC 
"~ CGTGCAGCACTAACGTATTGGCACCTGCCTCCTCTTCGGCCACCCCCCAGATGAGG 
CAGCTGTGACTGTGTCAAGGGAAGCCACGACTCTGACCATAGTCTTCTCTCAGCTT 
CCACTGCCGTCTCCACAGGAAACCCAGAAGTTCTGTGAACAAGTCCATGCTGCCAT 
CAAGGCATTTATTGCAGTGTACTATTTGCTTCCAAAGGATCAGGCCCTGAGAACAA 

L5 TGACCTTATTTCCTACAACAGTGTCTGGGTTGCGTGCCAGCAGATGCCTCAGATAC 
CAAGAGATAACAAAGCTGCAGCTCTTTTGATGCTGACCAAGAATGTGGATTTTGTG 
AAGGATGCACATGAAGAAATGGAGCAGGCTGTGGAAGAATGTGACCCTTACTCTGG 
CCTCTTGAATGATACTGAGGAGAACAACTCTGACAACCACAATCATGAGGATGATG 
TGTTGGGGTTTCCCAGCAATCAGGACTTGTATTGGTCAGAGGACGATCAAGAGCTC 

2 0 ATAATCCCATGCCTTGCGCTGGTGAGAGCATCCAAAGCCTGCCTGAAGAAAA 

Sequence ID 1337 
' CAAGAACTCTGGGACATTTGCAAAGGGTATGGCATATGTGTAATGGGAATACCAGA 

GGAGAGGAAAGACAGGAAGTCAAAAAAAGAATTTTTCCAAATTAATGATAGGTTCC 

2 5 AAACCACAGATGCAGGAAGCTTAAACACCAACAGGATAAATAA^CAAAATCTACG 

CTTAAGCATATCATACTTAACCTGCAGAAAATTACAGACAAAGAAAAAACACCAGA 
GGGGAAGCTGGCAGAAACATACCACCTATAGCGGAAGAAGAATAAGAATTACATCA 
GACTTCCCTTCAGAAATCTTGCAAACAAAAAGATGTAGCACAATATTTAAAGTATT 
AAAGGAGGCCGGGCCCGGTGGCTCGGGCCTGTAATCCTAACACTTTGGGAGGCTGA 

3 0 GGCAGGAGGACCATGAGGTCAGGAGATCGAGACCATCCTGGTGATGGTGATACCCC 

ATCTCTACTAAAAATACAAAAAATTAACCGGGCATGGTGACACGCACCTGTAATCC 
CAGCTACTTGGGAGGCTGAAGCAGGAGAATCGTTTGAGCCCAGGAGGTGGAGGTTG 
CAGTGAGCCGAGATCACATCACTGCACGCCTGGGCAACAGAGCGAGACTCCATCTC 

AAAAAA 

35 

Sequence ID 1338 

CGACCCGTTTTAGTCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCTGCCTC 
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GGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCGTAAATCAG 
GTTTTTTAAATGTTTGCCAAACCTTATCACTGACTTTTATAACAAAATTATTTACT 
ATAATCATTAGGGAATATTTAAGTTCTGCTAATACTTAAAATTGCAGAGTGCTAAA 
ACCAGCAGTGAGTTTAGAATCAAGCTAAGCTTTATTGTTGCTACTATTTGAGGCAT 
5 ATTAGTTGACTGGTGTTCATATGCAAGGCAGTCTACTGGGTGCAACAAGGGTTAGA 
AGGATATTTTTAAAAAACTGACCCTATTCTCAGGATGAAAATAATACACTAGTAAT 
AGTCTGCTCTGTTGGTTAACTCCTCGTAAGGAGGTACAATTAAAATGCTGTAGTGT 
TGCAAGGGAAGGAGAGGAAGAATCATATTCCTTCACTAGCAGGATCAAGAAAGCTT 
TTATAGAAATATACAAAATCTTCACTTCTTGAAGGATTGGTAAAATTTAATAGCCA 
10 ACATTGGGCACTTATTCATTCTCTGAGTAAATATTTATTGCATGCTTATCTTGTAT 
CAAGCATTGTGATGAAAGCACAAGAATGAAAGAGGAGGGAGAATGTTTAGAGAATA 
AGGGCTGAAACACAGATTTTGTAGGGAGCGTAGGGGAGACTGANAAGACAGGTTCA 
GGTTAGTAAGGGCGCTCATATTTTGACCCTGAATGTTAACTATGTGCACATCATGC 
TAGCTATTCTAAATC^GGCATTTTCAAATGGAAGCAGGCACTGACATTTT 

is 

Sequence ID 1344 

CGTGAAGGGTCTTTATGTATTAGTATTAGAGTGATCTTTTGATTATTTTCCTCACT 
ATAAGGAAATTATTTCCTCAGGATGAGCTGCCATAACATTCCACTGTCTGATGGCA 
ATTTTAAAGCCTGAAATTGAAGCCCATGGCTAGGCTATGAGAACCCTAGTTCGTAT 

2 0- AGTAAAGTTGATATCTTCTGGATGTATACTAATTTTAGGCTTTATTTTAAAACTGC 
TGGAAACTGAAACTTAGACAAAAGTATTTTCAGGACATCATTTACAATGTTTAGCC 
CTAAAGAGTCAAGCTGTGGGATTCTGAGTCTTTCATATGTTACAGCAGAAACTTAA 
AAGCAAGAGGAAATTGGCTGGGCACAGTGGCTCTGTAATCCCAGCACTTTGGGAGG 
CTGAGGTGGGTGGATCATGAGGTCAAGAGATTGAGACCATCCTAGCCAACATGGTG 

2 5 AAACCCCATCTCTACTAAAAATACAAAAATTAGCTGGGCGTGGTGGCACACGCCTG 
TAATCCCAGCTAGTCAGGAGGCTGAGGCAGGAGAATATCTTGAA.CTTGGGAGGGAG 
AGGTTGCAGTGAGCCAAGATTACATCACTGCACTCCAGCCTGGTGACAGAGCGAGA 

CTCCGACT 



30 Sequence ID 1348 

CTGAAACTGCACTGAACCCACAGGTAGGTTACATCACAGGACAGAAATCTGAGGAG 

CTGGAGAAAGCAAAAGAATAAAGGATGGGCTGACACCAGAAGGAATTAAAGGAATT 

TTTATACTGAACTTCAATTACTTGTTCATTTGAAGTTTGTTTTTTTAATGAACGTT 

TTTGCTGTTACTTAAATATAGTGTTTTGAAAGTGTTTCAAATGTATTCAAGTTGGG 

3 5 ATTTTCCATATTTTACTACAGTTCTGTCTTAGTATGTTCACCATAAAACACTTATC 

ATTAAAGCTCACAAAGTGCTTTTTTGTAATATGAGGATAAAATGAAGCCATATAAG 

AATTTTTTTATATCTGTACATTTAACCCACATTTGAGCTTTAGCCAAAATATATAG 
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CTTTTTTTTTTCTGACCTGGCCAACGTATTATCCAGCAAACATCAACTGAAGCAAT 
ATGGAAACACTTCCAAATGTTTGCCAATAATGCTATTAAGTGACTGATGTCAACAT 
TAGTTACATGGCAAACTAAAGAGGCATTATACATTTTTAAAACACACTAACATATA 
ACTGTAGATAATGTAAGGTTTATTTATATGCATATTTGATAGTATATTTAAATGTT 

5 TAAATATAAAAAAGGGTTTTTAAACACTTTTAATTTTTATCTTTGATTTTTTTTAT 
TGATATCTCTTTCCAGGCTACTAATAAAATTGCCAGAACTAAACTATCAGGTAAAG 
GTTAAGGCATCAATTGACAAGTAAGTTTTCTAATTTCGTTTTGAATTACAATTCCA 
~ AATGTAAGACTTTTAAAAATGAATGGCCTTTATTTTATAGAATAATTTTGACCTTT 
TAAATTTACTTATCTAACATTATATAATGAATGTACTTCAAATATTTGACTTTGAA 

.0 GTCAACATTAACAAA^TCATGGATCCTAATTAAAATTTACTATAAAACTGGAAT 
TTTATTACTTCCTTJ' 

Sequence ID 1351 

TTTTTTTTTTTTTAAAAGAGATGGGTTCTCACTATGTTGCCCATAATGTTTATGAG 
.5 ATTAAGTTCATCTTTTTTATCTGAGTAGTATTTTATTGTATGAATATACCACCATT 
TATTTATCTGTTGGTTATTTCCAGTTTTGGGCTATAATCCAAAATGCTTTT 
ACAATAGGCTATATATCATTAATGTCCGTTTATCAGCAGTATAAAATATCTTACCA 
TAAATATTAATAAAAGAAGCATTCATATATAAAATATAGATATTTCAAACCCTACA 
GAGGGCCTTTTAATGATTAAATATTTTGTCCTTACAAAAAGGTCCAGGTAATTACA 
1 0 CCCATGAGGTTAACCTGCCTTAGTGCAGGACTTAAAATAAGGCTTCTCCTGCCATC 
' TCTCTCCATTTGTAGAATGTGAAATTCTTTAAAATGCATCCTATATTAGGAATACT 
• ATAGCTGTGCACTGGTGTTTGTTCTCTTCTTTAAACTCGGGACCGTATATATCTGC 
' TCAAATTGCCCAAGTATACATATGCTGCACTCCATCAAGTGTCAGGCCACATTCTA 
TCAGCACAGCGTGACTGCCTATCAGTGACAATATAAGTGAGCTCTATTTGGATCCC 
2 5 TCTTACCCTACCTTTTATATTTATGACAGCATTATCATAAAACTCCAATATTCTTC 
AATAACTTACATGTTTGTTGTAGGATAAAATTATTACCCTCAATGAACTACAT 



Sequence ID 1352 

ACCAGCTTCTTCACAGGTTCCACGAGTCATGTCAACACAGCGTGTTGCTAACACAT 
CAACACAGACAATGGGTCCACGTCCTGCAGCTGCAGCCGCTGCAGCTACTCCTGCT 
GTCCGCACCGTTCCACAGTATAAA 

TCTTAATGCACAGCCACAAGTTACAATGCAACAGCCTGCTGTTCATGTACAAGGTC 
AGGAACCTTTGACTGCTTCCATGTTGGCATCTGCCCCTCCTCAAGAGCAAAAGCAA 
ATGTTGGGTGAACGGCTGTTTCCTCTTATTCAAGCCATGCACCCTACTCTTGCTGG 
TAAAATCACTGGCATGTTGTTGGAGATTGATAATTCAGAACTTCTTCATATGCTCG 
AGTCTCCAGAGTCACTCCGTTCTAAGGTTGATGAAGCTGTAGCTGTACTACAAGCC 
CACCAAGCTAAAGAGGCTGCCCAGAAAGCAGTTAACAGTGCCACCGGTGTTCCAAC 
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TGTTTAAAATTGATCAGGGA.CCATGAAAA.GAAACTTGTGCTTCACCGAAGAAAAAT 
ATCTAAACATCGAAAAACTTAAATATTATGGAAAAAAAACATTGCAAAATATAAAA 
TAAATAAAAAAAGGAAAGGAAACTTTGAACCTTATGTACCGAGCAAATGCCAGGTC 
TAGCAAACATAATGCTAGTCCTAGATTACTTATTGATTTAAAA 

5 

Sequence ID 1353 

ACATTCTGGAAAAGGCAAAAGGGAGGAAGAACTGATTAGTGGTTAGCCCAGGGTTA 
GAGTTGGGGAGAGGATATAATGAGGGAACTTTTGTGGATTCTGTACCATGATTATG 
ATTACACAAACCTATGCATACATTGAAACACATAGAACTATACGTTGAAAAAAGTG 

0 AATCTGCCTGTATGTAAATTTAAAAGAAAAATATTTTTTTAAAAAAACAGATGCTT 
CTTAACACATTATCATCTATGTCAGTTTAACAGTTAGTAGACTTAGGCCAGGTGTC 
ATGGCTCACTCCTGTAATCCGAGTGCTTTGGGAGTCTGAGGTGGGACGATCTCTTG 
AGACTAGGAGGGAGTTTGAGACAAAGCTAGGCAATGTAATGAGACTCTTTCTCTAC 
AAAAAATTTTAAAGTTATCTGGACATGGTGGTGCCTGCCTGTAGTCCCAGCTACTT 

.5 GGGAGGCTGAGGTGGGAGGATTCCTTGAGCCCAGAAGTTCAAGGCTACAGTGTGCT 
ATGATAGAGCCACTGCACTCCAGCCTGGGCAACCAGGTGAGACCTTGTCTCTAAA^. 

TGAATAAATAAAT 

Sequence ID 1355 

> 0 TGGTCTTTCACCCAGCCAGGGAGAAGGTTCTTCGCTCAGTATGAAGAAAAGCAACC 
CAAAACTCTCAATCTGATTTGTTTTTGTTTATGTCGATGCCCTGTAGTTTGAAAGT 
GAAGTAAAGATTTAGAATTCACCTAAGTCCAAAGGAAAACACGTGGTTTTTAAAGC 
CATTAGGTAAAAAAAGTTCTCAATAAAGGCATTACAATTTTTTAGGTTTAGAAAGA 
TGGACTTTTCTGATAAATCTTGGCAGACATCTAAAAAAAAAACCATATTTTTCACA 

2 5 AGAAAATGCAAGTTACTTTTTTTGGAAATAATACTCACTGATTATGGATAAAATGG 
AATATTTTCAGATACTATATTGGCTGTTTCAAAATAGTACTATTCTTTAAACTTGT 
AATTTTTGCTAAGTTATTTGTCTTTGTTGTATCTATAAATATGTAAAAAATATTTA 
AATAGATGTACCTGTTTTGCTTTCACACTTAATAAAAAATTTTTTTTTGT 



30 Sequence ID 1359 

CGGGATCCCTAGTATAACACATTCAGTGTTCCCCTTTCAGTCTTACTACTTTGACC 

GCGATGATGTGGCTTTGAAGAACTTTGCCAAATACTTTCTTCACCAATCTCATGAG 

GAGAGGGAACATGCTGAGAAACTGATGAAGCTGCAGAACCAACGAGGTGGCCGAAT 

CTTCCTTCAGGATATCAAGAAACCAGACTGTGATGACTGGGAGAGCGGGCTGAATG 

3 5 CAATGGAGTGTGCATTACATTTGGAAAAAATGTGAATCAGTCACTACTGGAACTGC 

ACAAACTGGCCACTGACAAAAATGACCCCCATGTGAGTATTGGAACCCCAGGAAAT 

AAATGGAGGAAATCATTTGCCTTAGGGATTGGGAAAGCTGCCCACTAACTGTCTTC 
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CCCATTGTTTTGCAGTTGTGTGACTTCATTGAGACACATTACCTGAATGAGCAGGT 
GAAAGCCATCAAAGAATTGGGTGACCACGTGACCAACTTGCGCAAGATGGGAGCGC 
CCGAATCTGGCTTGGCGGAATATCTCTTTGACAAGCACACCCTGGGAGACAGTGAT 
AATGAAAGCTAAGCCTCGGGCTAATTTCCCCATAGCCGTGGGGTGACTTCCCTGGT 
5 CACCAAGGCAGTGCATGCATGTTGGGGTTTCCTTTACCTTTTCTATAAGTTGTACC 
AAAACATCCACTTAAGTTCTTTGATTTGTACCATTCCTTCAAATAAAGAAATTTGG 
TACC 



Sequence ID 1360 -"NP^" 



10 



TGCGCAGACCAGACTTCGCTCGTACTCGTGCGCCTCGCTTCGCTTTTCCTCCGCAA 
CCATGTCTGAGAAACCCGATATGGCTGAGATGGAGAAATTCGATAAGTCGAAACTG 
AAGAAGACAGAGACGCAAGAGAAAAATCCApTGCGTTCCAAAGAAACGATTGAACA 
GGAGAAGCAiCGCAGGCGAATCGTAATGAGGC6TGCGCCGCCAATATGCACTGTACA 
IS JTTCCACAAGCATTGCCTTCTTATTTTACTTCTTTTAGCTGTTTAACTTTGTAAGAT 
GCAAAGAGGTTGGATCAAGTTTAAATGACTGTGCTGCCCCTTTCACATCAAAGAAC 
TACTGACAACGAAGGCCGCGCCTGCCTTTCCCATCTGTCTATCTATCTGGCTGGCA 
GGGAAGGAAAGAACTTGCATGTTGGTGAAGGAAGAAGTGGGGTGGAAGAAGTGGGG 
TGGGACGACAGTGAAAT 

20 

' Sequence ID 1361 

TATAAATACACTCCX3GGATGATTTACCCCCGGAGGTCAGCTAGTAAAATACATGAG 

TAGAATTCCTTAAAGTATGTGATAATTGCTCATCACTATCCAAGTGTGACATAAAT 

CATAAAAAGAATTGACAAAATCAGGGTCGCAAAGAGAATTGAAAAAAATCTGTCAC 

/ 25 AACCAAAATTTAAATTGACCTCTGTCCTAGAGTATGAGAGCCACACTGAACAGAAA 

AACCAGATAAATCTTTTATAAAATAtTCATTTGCAGCCCCATTAACGTTGCTTGTC 

ACCCCACCTCCCCATGTCCTTGGACAAACTGAATGTATAGTAACATCATCCCAGGC 

CAGGCGCGGTGGCTCATGCCTGTAATCCCAGCACTTTGTGAGGCTAAGGCAGGCAG 

ATCAGGAGGTCAGGAGTTCAGGACCAGCCTGGCCAZ^AAAGGTGAAACTCCGTCTCT 

3 0 ACTAACAATACAAAAATTAGCTGGGTGCGGTAGTAGGCGCCTGTAATCCCAGCTAC 

TCGGGAGGCTGAGGCAGGAGAATTGCTCAJ^ACCCGGAAGGTGGAGGTTGCAGTGAG 

CTGAGATCGTGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACTCTGTCTCGGG 

GAGGGGGGTGGCGGAGATAAAGAAATAACATCATCTTATACTGTCAAGCTCAAGGT 

GTCTGCAGCCTTATCTTCAGGGGAAGTTGTGTCTTTCTCAGGGAAGATACAGATTT 

3 5 CAATTTAGAGCAAGACAGAGAGAAGTTACATTCAGAGAGGAAAATGCAGTAGTCTA 

ACTG 
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Sequence ID 1364 

GCGGCCGCGCTCTTTTCAATTTTTAAAAAGAAGTTTGTTTTCCATTTCAGTAATTT 
CTGCTTTGATCTTCCTTATGTCCTCCTATTGAGTTGATCAGCTTTCTTTATTCTTG 
CCTTTTCTCCTCTGTGTGCCCTTTCTATTAACGTATTTACCCTTAGGCTGGGCACA 
ATGGCTGATGCCTGTAATCCCTGCACTTTGGGAGGCCGAGGCAGGTGGATCACCTA 
AGGTCAGGAGTTCAAGACCAGCCTGGCCAACATGGTGAAACCTGGTCTCTACTAAA 
AACACAAAAATTAGCCAGGCATGGTGGTGTGCACCTGTAATCCCAGCTACTCAGGA 
GGCTGAGGCAGGAGAATTGCTTGAACCTGGGAGGCGGAGATTGTGCCAAAGCACTC 
CAGCCTGGGCAACAAAATGAGACTTTGTGTC 

Sequence ID 1365 

CACCAGGCTGTCTTCAGATACTTCATACAGAAATGAGCCTCCCTGTGGGGTCCTCT 
TCCGTCCTTCAGGCTGTCCATCAACACAGCATTGCGGGATCCTTACCATGGCATCC 
AGCCCTGGAGATGCTTCAGGAAAGTTGCAGGTCCATGCTGCAGGACAGGCTCAGAT 
CAGCA.GAGACGCATCTCACATCGGGCTGTGAAATTCAAGTTGAGCTGCAATTGGCA 

ATGAGAA 

Sequence ID 1366 

GTTATTCACTGAGACCGTGCCCCGGTTATGAGGTTGTACCAGAT^AGCAAGTATTCA 
CTATGCACACTATTCACCGCTCACCCTAGCATTGAAGCCAGCCTGTAGCCTGAAAG 
CCTTTGCTTTGAGGGCAGGTCTTTCCCCAAAATGCAGACACGAAGGTGCAAAGTGA 
AGCTGCCAGTCTTGCAAAAGATGTAACTTGTCACGAAGGCCACGAGTGGCAGGGAG 
AGCTGTCCCACATTTGCGGAAGTGGCTATGTGAGGACGGGGGAGGCGGGTCCCTTA 
GAGATGAGACAATCATAAGGGGAGATATCAGAGAAAATCGTAAGGGGAGCAGATGG 
TTGTCAAGAGAATAGGCTGACCATCGAAGGACTGGCAGAAGCTTTCAGAAAACCAC 
TGGACGGCTGGGCACAGTGGCTTAGGCCTGTAATCCCAGCACTTTGGGAGGCTGAC 

GCAGGTGAATCACTTGAGGTCAGGAGTTCCAGACCAGCCTGGCCAACATGGTGAAA 
CCCCATCTCTACAGAAAATATAAAAATTAGCCAGGCGTGGTGGCACAAGCCTAGAA 
TCCCAGCTACTTGGGAGGCTGAGGCAGGCGAATGGCTTGAACCCAGGAGTCAGAGG 
CTGCAGTGAGTCGAGATTGTTCCACTGCACTCCAGCCTGGGTGACAGTGCAAGACT 

CCTTCCAAAAAAAAA 
Sequence ID 1367 

TTCGTGAGTGATGGCGTCCCGGGTTGCTTGCCGGTGCTGGCCGCCGCCGGGAGAGC 
CCGGGGCAGAGCAGAGGTGCTCATCAGCACTGTAGGCCCGGAAGATTGTGTGGTCC 
CGTTCCTGACCCGGCCTAAGGTCCCTGTCTTGCAGCTGGATAGCGGCAACTACCTC 
TTCTCCACTAGTGCAATCTGCCGATATTTTTTTTTGTTATCTGGCTGGGAGCAAGA 
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TGACCTCACTAACCAGTGGCTGGAATGGGAAGCGACAGAGCTGCAGCCAGCTTTGT 
CTGCTGCCCTGTACTATTTAGTGGTCCAAGGCAAGAAGGGGGAAGATGTTCTTGGT 
TCAGTGCGGAGAGCCCTGACTCACATTGACCACAGCTTGAGTCGTCAGAACTGTCC 
TTTCCTGGCTGGGGAGACAGAATCTCTAGCCGACATTGTTTTGTGGGGAGCCCTAT 
ACCCATTACTGCAAGATCCCGCCTACCTCCCTGAGGAGCTGAGTGCCCTGCACAGC 
TGGTTCCAGACACTGAGTACCCAGGAACCATGTCAGCGAGCTGCAGAGACTGTACT 
GAAACAGCAAGGTGTCCTGGCTCTCCGGCCTTACCTCCAAAAGCAGCCCCAGCCCA 
GCCCCGCTGAGGGAAGG'GCTGTCACCAATGAGCCTGAGGAGGAGGAGCTGGCTACC 
CTATCTGAGGAGGAGATTGCTATGGCTGTTACTGCTTGGGAGAANGGCCTAGAAAG 
TTTTGCCCCCGCTGCGGCCCGAGCANAATCCAGTGTTGCCTGTGGCTGGAGAAAGG 
AATGTGCTCATCACGAGTGCCCTCCNTTACGTCAACAATGTCCCCCACCTTGGGAA 
CATCATTGGTTGTGTGCTCAGTGCCCGATGTCTT 

Sequence ID 1368 . «• 

CAGTGAGCCAAGATCACACCACTGCACTCCAGCCTGGACAACAGAACGAGACTCCA 

TATCAAAAAAATTAAATTAAAATATAATAAATTTCTTGCCGGGCGCAGTGGCTCAC 

ACCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAAGTCAGGAGA 

TTGAGAGCATCCTGGCTAATACAGTGAAACCCCGTCTCTACTATAAATACAAAAAA 

TTAGCTGGGCATGGTGGCGGGCGTCTGTAGTCCCAGCTACTCAGGAGTCTGAGGCA 

GGAGAATGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGTGCCACT 

• GCAATCCAGCCTGGGCAGCAGAACGAGACTCCATCTCAAATAAATAAATAAATAAA 

ATGAATTTCAGCTAGAAGAGCCTTATTCCATTTTCCTTTTTATTAAACAT CTGGCA 

TAAGTTGGTAAGTATGTGAAGTTTATCATATATTCTTATGCGAATTATTATTTTCG 

CCTTTTTTTTTATAATTCTGTCTGGGATTTGAATAGTAGAGTTTGAATTCAGGAAG 

GACACCTGTGATAGGACAATAAAAT 
Sequence ID 1369 

CTGATTGCAAAAACATTACAACTCAGTACTGCGGCTTTCATTCAAATAGGTTATAT 
GTATAAACTGAGGTTCAACAATATTGTATTTGAGATGGGAAAGTTAAAGAAATGCA 
ATAATGTAAATAATACTTAAGAAAATAAGATCTCAGGAAACTGTGTATACTCTGTA 
CTTTTATGCAACTTTATCAGATCATTTCAGTATATGCATCAAGGATATAGTGTATA 
TGACATGAACTTTGAGTGCAAAAACTGTACTATGTACCTTTTGTTTATTTTGCTGT 

CAACATCTA2^ATAAAGGTTTTTTTG 

35 Sequence ID 1370 

CGAAAGGACTACAGAGCCCCGAATTAATACCAATAGAAGGGCAATGCTTTTAGATT 

AAAATGAAGGTGACTTAAACAGCTTAAAGTTTAGTTTAAAAGTTGTAGGTGATTAA 



15 



20 



25 



WO 2004/046382 



PCT/GB2003/005102 



- 267 - 

AATAATTTGAAGGCGATCTTTTAAAAAGAGATTAAACCGAAGGTGATTAAAAGACC 
TTGAAATCCATGACGCAGGGAGAATTGCGTCATTTAAAGCCTAGTTAACGCATTTA 
CTAAACGCAGACGAAAATGGAAAGATTAATTGGGAGTGGTAGGATGAAACAATTTG 
GAGAAGATAGAAGTTTGAAGTGGAAAACTGGAAGACAGAAGTACGGGAAGGCGAAG 
AAAAGAATAGATAAGATAGGGAAATTAGAAGATAAAAACATACTTTTAGAAGAAAA 
AAGATAAATTTAAACCTGAAAAGTAGGAAG 

Sequence ID 1371 

GTCCAGNAGAAAGTTCAGTGACTTGTCCAGAGCTGCAGGTCTTAAGAGGCTGAAAT 

CTCGCCTCTGCGTCGAGGCTGCGGTTCCACTGACCCATACTACTTGCCTTCAGGAA 

AGAGAAATGGTGTAGGAAGGCTGTGGATGAAGACGCTTACATTCATGAAGGATTTG 

GATAGGCGAACATGAGCTTTTCCACCAAATTTCAGAATTTTAAGAAATGCCTTAAA 

TTATTTCTTAAAAATC^^TTTGGGGCAGACGAGAAGTTCTGATAATAGTTTTTAGG ■ 

GAACATGATAAAATTCTGACCTTAGAAGTGGTATACCAGTTTGAGAAGAAGAACAA 

GCTATAAACGGTGTAGATAACATTCACGGCTATTTAAGAAAGAGTTACTAAGGGAA 

ACCAGAATGACTTAAGAGTGTTACTCTTCTTTTTCTGAGAGAACAATAGCATCATC \ 

TCAGAAAGCCTTTCATGCCATTAATAGGTAAGAATCTGGGCTTCTTGGACCATGGG 

TTAGACTTTCTTACAAAACCATAATATGCATTTCCTAGCAAAATTTATGCTATTAC 

ATTTCCTTATCTCAACAAAGACTGGTAAATTCAGTACTTATTCCTCAATTTTCCTA 

CCCTTAAAATGGGGATATTCTGCCTCTCCAAGGAATGCTGGGAACAAGCAAGTCCT 

CATGTTAGGGGTCTTTGAGTTTTCATGGAAGTTTAGGTTATTTATATGATGACATA 

GTTGTCAACTTACTTTCAGGATGGACTTTTCTTTTGTGAGTTTGTGACCTAAATAC 

AATAGTTGTTATGCATGTCCAGTTTATGGAAGTACCACTGCAATANCAG 

Sequence ID 1372 

CAGTGCAGCCAAGTATCACACCACTGCACTCCAGTCCTGGACAACAGAAACGANTA 
CTCCATATCAAAAAAATTAAATTAAANGATAATAAATTTCTTGCCGGGCGCAGTGG 
CTCACACCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACGAAGTCA 
GGAGATTGAGACCATCCTGGCTAATACAGTGAAATCCCCGTCTCTACTATAAATAC 
AAAAAATTAGCTGGGCATGGTGGCGGGCGTCTGTAGTCCCAGCTACTCAGGAGTCT 
GAGGCAGGAGAATGGTGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGT 
GCCACTGCAATCCAGCCTGGGCAGCAGAACGAGACTCCATCTCAAATAAATAAATA 
AATAAAATGAATTTCAGCTAGAAGAGCCTTATTCCATTTTCCTTTTTATTAAACAT 
CTGGCATAAGTTGGTAAGTATGTGAAGTTTATCATATATTCTTATGCGAATTATTA 
TTTTCGCCTTTTTTTTTATAATTCTGTCTGGGATTTGAATAGTAGAGTTTGAATTC 
AGGAAGGACACCTGTGATAGGACAATAAAATCTA 



25 



30 



35 
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Sequence ID 1374 

GAAAGCACATATGATATACATGTGTGTCATATGTATTATTTTGTTTGCCATCTGAG 
TCTTCAAAATTTGTTACAGAATACCTGCATATTAATATTTCAAGGTATGGATTAAT 

5 Sequence ID 1378 

CTGAGTATTAACTAAAAAAAAAAAAAAAAAAAAAAAAAAA 

Sequence ID 13 80 

CCAAACCCAACTGGTCCAGTAGGATACTCACCTTACAGGGGGCGTCTCAAGAGTCT 
1 0 CACAGTTCCCTTGGGTCTTAAGAGACTCACTGTTGGACCAGGCGTGGTGACTCACG 
CCTGTAAAACCAGCACTTTGGGAGGCCGAGGCGGGCGGATCAGTTGAGGTCAAGAG 
TTC^GACCAGCCTGACCAAGGTGCTGAAACCCCGTCTCTACTAAAAATACAAAAA 
TTAGCCAGGGATGGTGGTGTGCGCCTGTAATCCCAGCTACTCCAGAGGCTGAGGCA 
GGAGAATCTCTTGAACCCAGGAGGTGGAGGTTGCAGTGAGTCGAGATCATGCCACT 
15 GC^CTCCAGCCTGGGTGACAGAGCGAGACTCCGTCTTAGAAAAAAAAAAAAAAAAA 
AAAAGAACCTCACAGTTCAGCAGGGTTCTAGCATGAGACAATGAGGACAAGGGTAG 
GTGAGCAGGTGGAAAGfiGTGAGAACAGGTCAATTGTGATGGAGAAAATAATAAAGA 
CAGAAAAGGCAGAAGACTGCCTGGCAGAAGACCTGTCCCAGCAGATACAAAAATAC 
AGACAACAGGAGCCAGCATAGACCCTTGACCTGTGTAAGTCTTTCTCAGGCCTTCT 
2 0 TTTAAGTAGAAACATGCCTTTGAAAAAAAGTTTTAATAAACAGGAAAATCATAAAT 
' CCCTATTTACATAAATAATATATCCTGGTCTTATTCTTAAAACCATTGATTTTTCA 
CGGCTCATTAANAAAGCTGGGCGAGGTGGCTCACGCCCGTCATCCTAGCACTTTGG 
GAGGCCGAGGCGGGCANATCACAAGGTGAGGAGTTGGGAGACCAGCCTGACCAACA 
CGGTGAAACCCAGTCTCTACTAAAAATACAAAAATTANCTGGGGGTGGTGGTGTGT 

2 5 GCCTGTAATCCAAGCTACTCGGGAGGCTGAGGCAGGA 

Sequence ID 1382 

CTTACTACCTCCAACATGAAACAAGCAGCCCCGCACTTCTCGAAGGTCTGAGTTAC 
TTGGAATCGTTTTACCACATGATGGACAGAAGGAATATTTCAGATATCTCTGAAAA 

3 0 CCTCAAGCGTTACCTTCTTCAGTATTTTAAGCCAGTGATTGAGAGGCAAAGCTGGA 

GTGACAAGGGCTCAGTCTGGGACAGGATGCTCCGCTCGGCTCTCTTGAAGCTGGCC 
TGTGACCTGAACCATGCTCCTTGCATCCAGAAAGCTGCTGAACTCTTCTCCCAGTG 
GATGGAATCCAGTGGAAAATTAAATATACCAACAGATGTTTTAAAGATTGTGTATT 
CTGTGGGTGCTCAGACAACAGCAGGATGGAATTACCTTTTAGAGCAATATGAACTG 
3 5 TCAATGTCAAGTGCTGAACAAAACAAAATTCTGTATGCTTTGTCAACGAGCAAGCA 
TCAGGAAAAGTTACTGAAGTTAATTGAACTAGGAATGGAAGGAAAGGTTATCAAGA 
CACAGAACTTGGCAGCTCTCCTTCATGCGATTGCCAGACGTCCAAAGGGGCAGCAA 
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CTAGCATC3GGATTTTGTAAGAGAAAATTGGACCCATCTTCTGAAAAAATTTGACTT 
GGGCTCATATGACATAAGGATGATCATCTCTGGCACAACAGCTCACTTTTCTTCCA 
AGGATAAGTTGCAAGAGGTGAAACTATTTTTTGAATCTCTTGAGGCTCAAGGATCA 
CATCTGGATATTTTTCAAACTGTTCTGGAAACGATAACCAAAAATATAAAATGGCT 
GGAGAAGAATCTTCCGACTCTGAGGACTTGGCTAATGGTTAATACTTAAATGGTCA 
ATAGAAAAAGTAGGCTGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGA 

Sequence ID 1387 



10 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT.TTTTTTTTTTTTCAGT 
GTTAAAGTAGGTTTGTCGACGCGGCCACGAATTTCCCGGGGACCAA 

Sequence ID 13 8 9 

TTTTTTTTTTTTTTTGGGAGTCAGTTTTCTTTTCTTTTCTTTCTTTTTTTTTTTTT 
15 GNTTTTCGGAAACGGAGTCTCGCTTTCTCGCCCACTCTGGAGTGGNGCAGTGGGGN 
GGTCTCAGCTCACCACAGCCTCCACCTCCTGGGCCCAAGCGATCC'iraTrCACCTGAG 
CCTCCTGCGTAGCTGGGACTACAGGCGTGCACCACCATTCCCAGGTAATTTTTGTA 
TTTTTTGTANANACAGGGTTTCACTGTTGTTGCCCAGGCTGGTCTCGAACTCCTGC 
TTCAGTCTGCCANAATGCTGGATTCTAGGCGTGAGCCACCGNGCCTGGCCCAAAAG 
2 0 TTACTTTTCTTACAGAAGCAAAGCTTTAATGCATTTTACTGAATGCTTATAGCTTT 
GTAGATACTGAAAAGAGTATGAGCGTCACATACAGACACATNTAACAGCACTGCCT 
CCAACCAGCCCCTACCCACTGGTCAGGNGAGTAANAATCAAAATTCTTTTCTGNGA 
GTGGAACGGAAATTTCATCTCTCCTCCTCAGGCAAGTAGTTAANAGGCTGGNGGGA 
GTCATGGCCCCATTTTGTTCAAAATACAAGCTCCACAGGAACAAAAGGCTGAACTG 

2 5 CTCACCTCCCAACTGATGAACCTCGTCTTTGTTCCATGTCAAAGGGGCCTTTGTGT 

TACTGCAGCAGAAACTCCAGCTATCAAACCATCAGGCACCAAAAGTAAAACTCCTT 
TCTCTAAAAAGACCTCTCTTTACCTGAGCCTTTCAATGCATCTTTGCCCCCANATA 
ATCCTGGATGAGATAATCCCCAGAGGAANACCAGCGCTTGCCTAGTGAAATTATAC 
TATGAGACAAGGGTAAAAGACCTCAAANACCGGGTTGGCAGGTAAGGGAGTAGGGN 

30 

Sequence ID 1390 

TCNGTGGCACCCGTTTCCGGCACCTTCAGACTCTGAAGAGCCACCTGCGAATCCAC 
ACAGGAGAGAAACCTTACCATGTACGTAAGCCTCTTGAGGCCGCTCTCTGACCTGC 
GGGGATGTGGAGGGCAGGGAAGGAGGTGGAGCGCAGGGAAGGAGGTGGAGCAGGGA 

3 5 GGCAGTGGAACTGTTTGCTCCCATCTCAAGCACACAGTGGGGCAACCACTACGCTA 

ATGGTTGGAAGACCTAGATCTGGGCCCAATGGCCAGACACCCTGCTTGACCTTGGC 
CCAAGCATTAGGGGACTCATCTTTAAAATGAGGGTATGGGACTAGATGATCTGGGC 
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CTTAGGAGAGGAGT 
Sequence ID 1391 

CGGCTNCTACCCTGCGGAGATCACACTGACCTGGCAGTGGGATGGGGAGGACCAAA 
5 CTCAGGACACCGAGCTTGTGGAGACCAGGCCAGCAGGAGATGGAACCTTCCAGAAG 
TGGGCAGCTGTGGTGGTGCCTTCTGGAGAAGAGCAGAGATACACGTGCCATGTTCA 
GCACGAGGGGCTGCCGGAGCCCCTCACCCTGAGATGGAAGCCGTCTTCCCAGCCCA 
CCATCCCCATCGTGGGCATCGTTGCTGGCCTGGCTGTCCTGGCTGTCCTAGCTGTC 
CTAGGAGCTATGGTGGCTGTTGTGATGTGTAGGAGGAAGAGCTCAGGTGGAAAAGG 
L 0 AGGGAGCTGCTCTCAGGCTGCGTCCAGCAACAGTGCCCAGGGCTCTGAf GAGTCTC 
TCATCGCTTGTAAAGCCTGAGACAGCTGCCTGTGTGGGACTGAGATGCAGGATTTC 
TTCACACCTCTCCTTTGTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCA 
TCTGAATGTGTCTGCGTTCCTGTTAGCATAATGTGAGGAG<jTGGAGAGACAGCCCA 
CCCCCGTGTCCACCGTGACCCCTGTCCCCACACTGACCTGTGTTCCCTCCCCGATC 
15 ATCTTTCCTGTTCCAGAGAAGTGGGCTGGATGTCTCCATCTqj-GTCTCAACTTCAT 
* GGTGCGCTGAGCTGCAACTTCTTACTTCCCTAATGAAGTTAAGAACCTGAATATAA 
ATTTGTTTTCTCAAATATTTGCTATGAAGGGTTGATGGATTAATTAAATAAGTCAA 
TTCCTGGAAGTTGAGAGAGCAAATAAAGACCTGAGAACCTTCCANAATCCG 

20 Sequence ID 1392 

• TGAAACAAAATGAATTTNTATGGGTAAGAGAGGGTAATATTTTAGAGTTGTGTTAC 

AAAACTACAAATTTTTATTAAATTAATAAATCAGAATACTAAATCCATGTGTTTTT 

TTCTTTCTTAAAAAATATCTTTTGGCTGGGCACGGTAGCTCATGGCTGTAATCCCA 

GCACTTTGGGAGGCTGAGGTGGGTGGATCGCCTGATGTCAGGAGTTCAAGACCAGC 

2 5 CTGGTCAACATGTTGAAACCCCATCTCTACTAAAAATATAAAAATTAGCCGGTGTG 

GTGGTGGGCGCCTGTAATCCCAGCTACTCAGGAGGCTAAGGCAGGAGAATTGCGTG 
AACCCAGGAGTTCAGTGATGTAGCGGGGAGCTGAGATTGTGCCACTACACTCCAGC 
CTGGATGACAGAGTGAGACTCCATCTCAAAAAAAAAAAAAAAAAA 

30 Sequence ID 13 94 

GCATAATGTGAGGAGGTGGAGAGACAGCCCACCCCCGTGTCCACCGTGACCCCTGT 

TCCCATGCTGACTTGTGTTTCCTCCCCAGTCATCTTTCCTGTTCCAGAGAGGTGGG 

GCTGGATGTCTCCATCTCTGTCTCAACTTTATGTGCACTGAGCTGCAACTTCTTAC 

TTCCCTACTGAAAATAAGAATCTGAATATAAATTTGTTTTCTCAAATATTTGCTAT 

3 5 : GAGAGGTTGATGGATTAATTAAATAAGTCAATTCCTGGAATTTGAGAGAGCAAATA 

AAGACCTGAGAACCTTCC^GAAAAAAAAAAAAAAAAAAl^AAAAAAAAAAAAAAAA 
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
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Sequence ID 1395 

CTTACCATGTCAGTGCACAGAAATGCTGTCTTGGGATGTAC3GAAAAATAAATCCAC 
AAAAGCTACCAAGTTTGAAGGGGACCATGAGTCTTCAGGCTGGAGCTTCCAAACCA 
GATGAAAACCCCACAATTAACCTGCAGTTTAAGATCCAGCAGCTGGCCATTTCTGG 
5 ACTCAAGGTGAATCGTCTGGATATGTATGGAGAAAAGTACAAACCCTTTAAGGGCA 
TAAAATACATGACCAAAGCTGGGAAGTTCCAAGTTCGAACCTGAAGGGAGCATTTG 
CTGAGGGAATAGTCTTGCACATTTTTTCATTTCTTACTTGTCTAAAAGTAAAAAAA 
AATATCAGCCTGTCTCCTAGGTCAGTCCCCTCCTGGACCCACCCGCTCCCTTTTTT 
CCTTAGCCTTCAGTGCCATGGAACTAATCAAGGGAGGAAAAGGTCACCAGGGAGAA 
10 CTGGACAGAACTGAAACACAGCAACACCAGTTCTCAAGGACAAGGTGTGTGATGGG :• . ,/ . •' 

GGTAGGAAGCTTGGTGCTTATGTAACCATTTTAAACGTGGTTTCTATAGGAAAGAC 
CAACATTTGTTTAGCTTGCTTGGCTTTAATTATCTAAAGCCAATGAAAGACTTCTT 

TGTTGATTTTTTAAGATAGAAAGATT 

15 Sequence ID 1396 

CAAACACTATGTTATTTTATGAANAAGACTTGAACATCTATGGATTTTGGTATTTG 

CAAGGGGTGAATGGGGTATTTGCAAGCAGTGAATGAGGAGGCCTGGAACCAATCTT 

CTGCTGATATTGAGGCACAACTGAAAAAGGTATATTACTTAAATCTCTTATTGTAT 

TGTAAACTGTATAAGTAATGAAATTAAAAGGCAGAAATTGTCAGACTGAATAAAAT 

2 0 GAAAAGACCAAACAATATGCTGCTTACAAGAAACACAATTCAAATATAAGGACACA 

ATTAGTTTAAAGGAAAAGAACTGGAAAAGATATACCATGATAACACAAGTCAGAAG 

AAAGCTGCTGTGGATATATTAATATGAGATGTAGATTTGAGAGCAGTGAATATTGC 

CAGGCATAAAGAAAGTTATTACATAATAATTAAGGTATCAGTTCATCAAGAAGATG 

TAATAACCCTAAGTATTTATACAACTAATATCAGAGCTTCAAAATACATGAAGCAA 

25 AAACCAGTGGAATTGATAGGAGAAACACACAATTACACAATTATAGTCAGAATTTT 
CAACATATCTTTCTCAATGGAGAAAACAACTAGACAGGAAATCATTAAGGATATAG 
ATGATTTAAATTATATGATCAACTACCTGGACGTAATTGGCATTTATGGAACACTG 
CACCACCAACAGCAGAGTACATATTATTTTCAAGTACACAGAAAACAGTTACCAAT 
ATAGACCATTTTCTGGGTCATAAAACACATCTCAATAAATGTAAAACAATTAATGT 

30 TATATAAAGTATGTGCTCTGACCNCAAAGGAATTAGAGATCAATAAAAGAACATCT 
TTGAAAAATCTCACNTATTTAAAAACTAATAACTCACTTCTAAATAACTCCTGTNT 

CAAGAGAATNAAANGG 
Sequence ID 1397 

35 

CCCAGCCTCACTGCGCCCCGTCAGGCCAGGCAGCTGCCCTCAGGGTCTGCCAAGGT 
GGGGGTCAAGGGCCATGGGGGCAGGTAGCTCTGCCTGCAAAGCCCACAAGCATGTC 
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AGATCACCTGGGCTGCAGACAGACAAACACCTGAGCTGTTCTGAATACCTTCAGGT 
TCCTGGCCTCGCTGAGCAAGTGCAGAAATTTTTACCTTCAAGGATCAGGGTTTTTC 
TGTTTGTTTGTTTTTTAACACACACATATGTGAACAAAGAGTATGCGTTTGTACTG 
GCAGAAGAAGCGTCTGGTAAGACAACCAGCAAGTTAACAATGGTCACCTCCAGAAA 

GCTCTGTCACCCAGGCTGGAACGCACTGGTGTGATCACGGCTCACTGCAGCCTTGA 
CCTCCCTGGCTCAAGCAATCCTCCCAGCTCAGCCTCCTGAGTCGTTGGGACTACAG 
GCACGTGCCACCACGCCTGACACATTTTTTAAATTTTTGTAGAGACAGTGTTTCAC 
CATGTTGCCCAGGCAGGTCTCAAACTCCTGGGCTCAAGTGGTCCTCCAGCTTCAGC 
1 0 CTCCCAAA.GTGCTAGGATTATAGGTGTGAGCCACAGTGCCCAGCCCCGTAGTGGAG 
- AATTTCTGTTGAATGAACCAAAA.GCAACTGCCAACCTCTCCATGCACCATGTGTTT 
CAGAGGAGA2^GCACAGTGAAGAATGCAGTGTGTTCTGAGGTCCTGTCACCCCTGA 
GGCTGTGTGTGTCCTTTGCCAAATTAAAGAGTCTTACTGAATGCGGTGCATCCAGG 
AGACAGGCCNAGGTTTGGACTGGTAAAAAAAAA 

15 

Sequence ID 1399 

CAGACACCTGGNAGAACGGGAAGGAGACGCTGCAGCGCGCGGACCCCCCAAAGACA 
CATGTGACCCACCACCCCATCTNTGACCATGAGGCCACCCTGAGGTGCTGGGCCCT 
GGGCTTCTACCCTGCGGAGATCACACTGACCTGGCAGCGGGATGGCGAGGACCAAA 
2 0 CTCAGGACACCGAGCTTGTGGAGACCAGACCAGCAGGAGACAGAACCTTCCAGAAG 
• TGGGCAGCTGTGGTGGTGCCTTCTGGAGAAGAGCAGAGATACACATGCCATGTACA 
. GCATGAGGGGCTGCCGAAGCCCCTCACCCTGAGATGGGAGCCATCTTCCCAGTCCA 
' CCGTCCCCATCGTGGGCATTGTTGCTGGCCTGGCTGTCCTAGCAGTTGTGGTCATC 
GGAGCTGTGGTCGCTGCTGTGATGTGTAGGAGGAAGAGTTCAGGTGGAAAAGGAGG 

2 5 GAGCTACTCTCAGGCTGCGTCCAGCGACAGTGCCCAGGGCTCTGATGTGTCTCTCA 

CAGCTTGAAAAGCCTGAGACAGCTGTNTTGTGAGGGACTGAGATGCAGGATTTCTT 
CACGCCTCCCCTTTGTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCACC 
TGAATGTGTCTGCGTCCTTGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACC 
CTTGTGTCAACTGTGACCCCCTGTTCCCATGCTGACCTGTGTTTCCTCCCCAGTCA 

3 0 TCTTTTTTGTTCNCAATAGGTGGGGCCTGGATGTCTCCATCTCTGTNTCA 



Sequence ID 1440 

TTATAAGGTACTTTTAAGGTATTTTAGTTGTCTTAGTCTATATTTCTGTACTCACC 
TTTCTTTATCCACTCATCAGTTGATGGGCATGTAGGTTGGTTCCATATCTTTGCAA 
TTCTGAATTGTGCTGTGATCAGGTGTCTTTTTAGTATAATGATTTACTCTCCTTTG 
GGTAGATACCCAGTAGTGGGATTGCTGGATCGAATGGTTTTTATAATTTTCTATTT 
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TAC(^CAGTTTCTCTCTGCATTTTTCCTCTTTGACCACTAACCATGTGAAATTCTC 
ATATTGACCTTTATAATGATCATGAACTCTTAGTATCATTGGGAAGGCCACATTTG 
CCACTTATGATTGTAAACCTTATCCTCCATTTTTCCTGTTATTGTTGGTGCAAAAA 
GCACCTATTATACCAGGACTTTAAAAATCAGTCTGATAAGTCTTTGATAAGTCTAA 
TAATAATAACTGATAAGTCCATTGAATTTGCTTCTGATTACTTTTTCTTTAGTAGC 
TAAACATGTATGTACTCCTATGATTACAATGAACACTCCTCTCCATTTAAATTAAT 
TATTTACATTGATGAAATAGCAAAATGTTAATGACTAAATACTGTCTTGGTTTTTT 
CGTTCCAGGTCAGTCAATATTAACTTCTTATAATTTTCTTTTTTTTCTTT 

Sequence ID 1447 

GCAAGGACTAACCCCTATACCTTCTGCATAATGAATTAR.CTAGAAATAACTTTGCA 

AGGAGAGCCAAAGCTAAGACCCCCGAAACCAGACGAGCTACCTAAGAACAGCTAAA , 

AGAGCACACCCGTCTATGTAGCaAAATAGTGGGAAGATOTATAGGTAGAGGGGACA. - 

AACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACTTT 

AAATTTGCCCACAGAACCGTCTAAATCCCCTTGTAAATTTAACTGTTAGTCCAAAG 

AGGAACAGCTCTTTGGACACTAGGAAAAAA.CCTTGTAGAGAGAGTAAAAAATTTAA 

CACCCATAGTAGGCCTAAAAGCAGCCACCAATTAAGAAAGCGTTCAAGCTCAACAC 

CCACTACCTAAAAAATCCCAAACATATAACTGAACTCCTCACACCCAATTGGACCA 

ATCTATCACCCTATAGAAGAACTAATGTTAGTATAAGTAACATGAAAACATTCTCC 

TCCGCATAAGCCTGCGTCAGATTAAAACACTGAACTGACAATTAACAGCCCAATAT 

CTACAATCAACCAACAAGTCATTATTACCCTCACTGTCAACCCAACACAGGCATGC 

TCATAAGGAAAGGT 
Sequence ID 1448 

GGCCACCGGGTGCAAGGTCAGGGCTGGGGTGGAGGCTGGGAAGCCCAGGGCTTGGC 

CCACTGTGGCCGCCTTGTGTGGTCACTGCTTTCCTGGGCCTGCTGTGAGCTCCCTC 

TAGGACCCCAGGCCTGTCTGGTGGGTCACTGTGACCACCACCTTGCACAGCACCTG 

GCGCGTGGCAGGTGCTCAAACATTACTTGTTTCGGAATGAACTTCATCTTGCTCTT , 

GGCTTTTTGACTAATGCTGTGGAACATCTGACTAATTAGTGACTCTTTGGGGCCCC 

CAGTTTCCCAGCTATAAAGTGGTAATATTAAGATAATAATTCGGCCGGGCGCGGTG 

GCTCACGCCTGTAATCCCAGCAGCACTTTGGGAGGCCGAGGTGGGCAGATCACGAG 

GTCAGAAGATCGAGACCATCCTGGCTAACACGGTGAAACCCCATCTCTACTAAAAA 

TACAAAAAATTANCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCANGAG 

GCTGANGCAGGAGAATGGTGTGAACCCGGGAGGCAGAGGTTGCAGTGAACCAAGAT 

CGNNCCACTGCACTCCAGCCTGGGCAACAGAGCGAGACTCCATCTTAAAAAA 
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Sequence ID 1449 



WO 2004/046382 



PCT/GB2003/005102 



- 274 - 

AATCAGGGCCGCAGTGTGTTCTGCGCCTGCCCAGAGCTGACTCCTGATTTAACCGC 
TGGCGTAA.CCGCGGGTTGCACGCATGCGTGCTGAAAAGCCTTTCACCCTCACGTGG 
TTTCTTTTTTAACCAGTCATCAAGCGAGGCTCGCGCGCAGGCCCCGCGTTGGAAAA 
TGGCGGGGAAGCTGAAACCTCTGAATGTGGAGGCGCCAGAAGCTGCTGAGGAGGCT 
5 GAAGGTAGTGAGGGCAAGTGGGCTGCACTCCTTTCTCTCCAACCAGGGCAGAAAGG 
AGGGAGGATTCGTCCCATTACAATAATGAAATAATGATATTCTAATTTTTTTAAAT 
AAAATGTTAAGCCTTTTGTTATTGAA 

Sequence ID 1450 - . « 

LO GGAAANCATGAGGCTTCGGGAGCCGCTCCTGAGCGGCAGCGCCGCGATGCCAGGCG 

cgtccctacagcgggcctgccgcctgctcgtggccgtctgcgctctgcaccttggc 
gtcaccctcgtttactacctggctggcggcgacctgagccgcctgccccaactggt 
cggagtctccacacIgctgcagggcggctcgaacagtgccgccgccatcgggcagt 
cctccggggagctccggaccggaggggcccggccgccgcctcctntaggcgcctcc ^ 
15 tcccagccgcgcccgggtggggactccagcccagtcgtggattctggccctggccc 
cgctagcaacttgacctcggtcccagtgccccacaccaccgcactgtcgctgcccg 
cctgccctgaggagtccccgctgcttggtaaggactcgggtcggcgccagtcggag 

GATTGGGACCCCCCCGGATTTCCCCGACAGGGTCCCCCANACATTCCCTCAGGCTG 
GCTCTTCTACGACAGCCAGCCTCCCTCTTCTGGATCAGAGTTTTAAATCCCANACA 
2 0 GAGGCTTGGGACTGGATGGGAGAGAAGGTTTGCGAGGTGGGTCCCTGGGGAGTCCT 
" GTTGGAGGCGTGGGGCCGGGACCGCACAGGGAAGTCCCGAGGCCCCTCTAGCCCCA 
AAACCANAGAAGGCCTTGGAGACTTCCCTGCTGTGGCCCGAGGCTNAGGAAGTTTT 
GGAGTTTTGGGTCTGCTTANGGCTTCNAGCAGCCTTGCACTGAGAACTTTGGTAGG 
GACCTCGAGTAATCCACTCCNTTTTNGGGACTGACGTGAGGCTCCCGGTGGGGAAA 

25 GANACTGACCTNTC 

Sequence ID 1453 

CCGACCTGTCTCGCTCCGTGGCCTTAGCTGTGCTCGCGCTACTCTCTCTTTCTGGC 
CTGGAGGCTATCCAGCGTACTCCAAAGATTCAGGTTTACTCACGTCATCCAGCAGA 
GAATGGAAAGTCAAATTTCCTGAATTGCTATGTGTCTGGGTTT.CATCCATCCGACA 
TTGAAGTTGACTTACTGAAGAATGGAGAGAGAATTGAAAAAGTGGAGCATTCAGAC 
TTGTCTTTCAGCAAGGACTGGTCTTTCTATCTCTTGTACTACACTGAATTCACCCC 
CACTGAAAAAGATGAGTATGCCTGCCGTGTGAACCATGTGACTTTGTCACAGCCCA 
AGATAGTTAAGTGGGATCGAGACATGTAAGCAGCATCATGGAGGTTTGAAGATGCC 
GCATTTGGATTGGATGAATTCCAAATTCTGCTTGCTTGCTTTTTAATATTGATATG 
CTTATACACTTACACTTTATGCACAAAATGTAGGGTTATAA.TAATGTTAACATGGA 
CATGATCTTCTTTATAATTCTACTTTGAGTGCTGTCTCCATGTTTGATGTATCTGA 
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GCAGGTTGCTCCACAGGTAGCTCTAGGAGGGCTGGCAACTTAGAGGTGGGGAGCAG 
AGAATTCTCTTATCCAA.CATCAACATCTTGGTCAGATTTGAACTCTTCAATCTCTT 
GCACTCAAAGCTTGTTAAGATAGTTAAGCGTGCATAAGTTAACTTCCAATTTACAT 
ACTCTGCTTAGAATTTGGGGGAAAATTTAGAAR.TATAATTGACAGGATTATTGGAA 
ATTTGTTATAATGAATGAAACATTTTTGTCATATAAGATTCATATTTACTTCTTAT 

ACA 

Sequence ID 1454 

TAAATAGGGAATCCTTTCCCCATTGCTTGTTTTTCTCAGGTTTGTCAAAGATCAGA 
TAGTTGTAGATATGCGACGTTATTTCTGAGGGCTCTGTTCTGTTCCATTGATCTAT 
ATCTCTGTCACATGCACACGTATGTTTGTTGTGGCACTATTGACAGTGGCAAAGAC 
TTGGAACCAACCCAAATGTCCAACAATGATAGACCGGGTTAAGAAAATGCGGCACA 
TATACACCATGGAATACTATGTAGCCATAAAAAATGATGAGTTCGTGTGCTTTGTA 
GGGACATGGATGAAATTGGAAATCATCATTCTCAGTAAACTATCGCAGGAACAAAA 
AACCAAACACTGCATATTCT(^CTCATAGGTGGGAATTGAACAGTGGGAACAeATG 
GACACAGGAAGGGGAACATCACACTCTGAGGACTGTTGTGGGGTGGGGGGAGGGAG 
GAGGGATAGCATT(jGGAGATATACCTAGTGCTGGATGACGAGTTAGTGGGTGCAGC 
GCACCAGCATGTCAC^TGTATACATATGTAACTAACCTGCACATTGTGCACATGTA 

CCCTAAAACTTAAGGTAT 
Sequence ID 1456 

CCGCAACAAACACGGGAGTGCAGATATCGCTGCGATGGGCTGATTTCCTTTATTTG 
GGTATATACCCAGCAGTGGGATTGCTGGATTGTATGGTAGCTCTATTAGTTTTTTG 
AGGAACCTCCAAACTGTTCTNCATAGTGGTTGTACTCATTTACATTCCCACTGTGA 
ACCCTGAAAATTTGAGGCAGGTCTCAGTTAAATTAGAAAGTTGATTTTGCCAAGTT 
GGGGACACGCACTCGTGACACAGCCTCAGGAGGAACTGATGACATGTGCCCAGGTG 
GTCAGAGCACAGCTTGGTTTTATACATTTTAGGGAAACCTGAGCCATCAATGAACA 
TACGTAAAATGGGCCGGGCACAGCAGCTCAAGCTGTAATCCCAGCACTCTGGGAGG 
CCGAGGCGGGTGGATCACTTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATGG 
TGAAACCCCGTCTCTATTAAAAATACAAAGCTTAGCTGGATGTGGTGGCGCATGCC 
TGTAGTCCCAGCTGCTCTAGGAGGCTGAGGCATGAGAATTGCTTGAACCTGGGAGG 
CAGAGGCTGCAGTGAGCCGAGATCGAGCCACTATACTCCAGCCTGGTCAACAGAGT 

GAGACCCTGTCT 
5 

Sequence ID 1460 

CCACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCACCTGACTCCTG 
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AGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGT 
GGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGA 
GTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGG 
CTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAAC 
5 CTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGA 
TCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACT 
TTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGT 
GTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATT 
TCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATG 
1 0 AAGGGCCTTGAGCATCTGGATTCTGCCTAAT2VAAAAACATTTATTTTCATTG 

Sequence ID 1490 
ATGGGCkTCTCTCGGGACAACTGGC^^ 

GCCCTACCACAAGAAGCGGAAGTATGAGTTGGGGCGCCCAGCTGCCAACACCAAGA 
15 TTGGCCCCCGCGGCATCCACACAGTCCGTGTGCGGGGAGGTAACAAGAAATACCGT - 

GCCCTGAGGTTGGACGTGGGGAATTTCTCCTGGGGCTCANAGTGTTGTACTCGTAA 
AACAAGGATCATCGATGTTGTCTACAATGCATCTAATAACGAGCTGGTTCGTACCA 
AGACCCTGGTGAAGAATTGCATCGTGCTCATCGACAGCACACCGTACCGACAGTGG 
TACGAGTCCCACTATGCGCTGCCCCTGGGCCGCAAGAAGGGAGCCAAGCTGACTCC 
2 0 TGAGGAAGAAGAGATTTTAAACAAAAAACGATCTAAAAAAATTCAGAAGAAATATG 
' ATGAAAGGAAAAAGAATGCCAAAATCAGCAGTCTCCTGGAGGAGCAGTTCCAGCAG 
GGCAAGCTTCTTGCGTGCATCGCTTCAAGGCCGGGACAGTGTGGCCGAGCAGATGG 
' CTATGTGCTAGAGGGCAAAGAGTTGGAGTTCTATCTTAGGAAAATCAAGGCCCGCA 
AAGGCAAATAAATCCTTGTTTTGTCTTCACCCATGTAATAAAGGTGTTTATTGTTT 

25 TTGTT 



Sequence ID 1491 

CTTNCACATACTGATTGATGTCTCATGTCTCTCTAAAATGTGTAAAACCAAGCTGT 
GCCCCAACCACCTTGGGNACATGTGGNGAGGACCTCCTGAGGCTGTGTCATGGGCA 

3 0 CACCTTAACCCTGGGAAAATAAACTTTCTAAACTGACTTGAGAGCTGTCTCAGATA 
TTCTGAGCTTACAGTTATTGTGAAATCATTTTAATTATAAATTAAGTGGAGATTTA 
CTTAAAATCATGTGTAGAAGTAGCCTGTGATATAGTCCTAGATACATACATTATCA 
TCTTATGTATCTTCCCTCCCTCTTCCAGGTTCTGATAAAAACAGATGAAATCTGAA 
AGACCATGACAGTAGTATTTTGAAAATGACAGTATTTGAAATTAAAAAATTGTAAA 

3 5 AGTGTTCTGTTCTATCACTGCCAAAGGATAAGTTACAAATTGGTTCTTGGAACGTA 
ATATGTACTATGTGCTTGCTATTTAATAATTTACCAGTCTTAGTCTTTTTTATTCA 
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TAGTTTTAGTTATAGTTTTAGTTGTAGTAG2\AAAAAGCATTTTCTGTAAGCTTAAT 
TTCTTTCCCCTTCCCGCTTTCCCAGTCAGATGACTTTAGTGATTTGGAGTTGTGTG 
CTTTATAAGTGCATTCCTCAGAGGACTTAATATTACTAAGATTTTAGCAACNCTGA 
AATATGTT 

5 

Sequence ID 1492 

TGTNCCTGTAGTCCTGTGTGGGAGGATTGCCTGAGCCTAGGAGCTCAAAGTTGCAG 
TGAGCCCAGATCGNGNCATTGCAGTCCAGCCTGGGTGACAGAGTGAGACCCCATGT 
CAAAAAAAAAAAAACAAAAAACAGGGGCCTGCCTCANCCAGCAGGTGAGGTCTGCC 

1 0 ACTGAGAGCACTTCTAGCAGGAGGAACAGCCTCCACCCCCACACTGCAATCAAGTT 
TTTTGGGTCAGCCTTAGGAGCTAANAAAGGGCCTAGTTTGNCTAAATAGCAGGAGT 
TATATCCAGGGATCTTCAGGCCCAGGAATGCTAATGAGTAGGCATTCCATGGGCCC 
TGGGAATGGCTTTGTGTGCCANAAATGATGGCCACAAAGGCCTTGCTGCCTTTTTT- 
CAAAATGGCTGCATCCAGCTGAGTGCTCTCTGCCAA^GGGGANAANAAAATAAGTC 

15 TCCAGTGCATTTAGATTGGTCTCTCATCATCTCTCTCCTTTTTGTTTTTATTAGTQ 
TCCTTAACC^AAACTGCCAAGAAAGGCTTGGAATTGAAACAAAACCTGATA^ 
GGTAAGAGGTTGTTCTTTT 

Sequence ID 1493 
2 0 TGTNTCAAAAAAAAAAAAAAGAACGGNAATGTACTGGAGATGTATTTGATAACCAA 
GGNTTTAGGTAAATTTTCACCAGTATTAGTTNTATTTGCAAACTGAAAAATGTTGT 
AGGCTTAATATAAAATAACCACATTAGTGAACATTATATCTCTTAGAAGAAAGGCC 
ATATTTTGCTCCTGCTTCTGTAAAAATATTATTTGTTTGAAGGGGAAATAATGGTA 
GTGTGACCTTTCACTTAATTCCTACTCCCTTAATGTGAGAGAGACAAAATGAGCTG 

2 5 AAGAAGGAAAATTCTGGAGTTAC^CTCC^CAACCTTGAACATACTGACGGACATCT 

CTGTTTTGACAACGATTTCTCCATGCCACCCATGCTNTAATGCCTTGTGGATCACG 
GACAACCCTCTTTGCACAAGCTACAG 

TGCAGGATAAATGACAGGCATTAACTGCTCCTGGGGTTTTGCCATCATTACACCAG 
TAGCGGCTATTGATCTGAAATATCCCATAATCAGTGCTTCTGTCTCCAGCATTGTA 

3 0 GTTTGTAGCTCGTGTGTTGTAACCACTCTCCCATTTGGCCAAACACATCCAGTTTG 

CTAGGCTGATTCCCCTGTAGCCATCCATTCCCAATCTTTTCAGAGTTCTGGCCAAC 
TCACACCTTTCAAAGACCTTGCCCTGGACCGTAACAGAAAGGAGGACAAGCCCCAG 
AACAATGAGAGCCTTCATGTTGAC 

3 5 Sequence ID 14 94 

TTGGTACCCGGGAAATTCTTTGCCGCGTCGACGGCCGGTGAGGCAGATCACCTGAG 

CCCAGGAGTTCAGGACCAGCCTGGGCAGCATACCGGGATTCCATCTNNACTAAAAA 



1 



WO 2004/046382 PCT/GB2003/005102 

- 278 - 

CAGTAGGCTGGGTGTGGTGGCTCATGTCTGTAAGCTCAGGACTTTGGAAGGCCAAG 
ATGGGAGGATCACTTGAGCCTGGGAGTTTGACACCAGCTTGAGCATCGTAGCCAGG 
CCCTGACTCTACAAAAAAGTGAAATAATTAGCCGAGTGTGGTGGTTCACACCTGTA 
ATCCCAGCTGCTCAGGAGGCTGAGGTAGGAGAATCATTTGAACCCGGGAGGTGGAG 
5 GTTGCAGTTAGCCGAGATCACGCCA.TTGCACTCCGGCCTGGGCGATAAAGCGAGAC 
TCTGTCTCAAAAAAAAAAAAAA 



Sequence ID 1495 

ATTCGGGCCGAGATGTCTCGCTCCGTGGCCTTAGCTGTGCTCGCGCTACTCTCTCT 
10 TTCTGGCCTGGAGGCTATCCAGCGTACTCCAAAGATTCAGGTTTACTCACGTCATC 
GAGCAGAGAATGGA^AGTCAAATTTCCTGAATTGCTATGTGTCTGGGTTTCATCCA 
TCCGACATTGAAGTTGACTTACTGAAGAATGGAGAGAGAATTGAAAAAGTGGAGCA 
TTCAGACTTGTCTTTCAGCAAGGACTGGTCTTTCTATCTCTTGTACTACACTGAAT 

tcacccccactgaaAaagatgagtatgcctgccgtgtgaaccatgtgactttgtca 
15 cagcccaagatagttaAgtgggatcgagacatgtaagcagcatcatggaggtttga. 
agatgccgcatttggattggatgaattccaaattctgcttgcttgctttttaatat 

TGATATGCTTATACACTTACACTTTATGCACAAAATGTAGGGTTATAATAATGTTA 
ACATGGACATGATCTTCTTTATAATTCTACTTTGAGTGCTGTCTCCATGTTTGATG 
TATCTGAGCAGGTTGCTCCACAGGTAGCTCTAGGAGGGCTGGCACCTTAGAGGTGG 

2 0 GGAGCAGAGAATTCTCTTATCCAACATCAACATCTTGGTCAGATTTGAACTCTT 

Sequence ID G6 

GGATTTTTGGTCCGCACGCTCCTGCTCCTGACTCACCGCTGTTCGCTCTCGCCGAG 
GAACAAGTCGGTCAGGAAGCCCGCGCGGAACAGCCATGGCTTTTAAGGATACCGGA 
25 AAAACACCCGTGGAGCCGGAGGTGGCAATTCACCGAATTCGAATCACCCTAACAAG 
CCGCAACGTAAAATCCTTGGAZ^AGGTGTGTGCTGACTTGATAAGAGGCGCAAAAG 
AAAAGAATCTCAAAGTGAAAGGACCAGTTCGAATGCCTACCAAGACTTTGAGAATC 
ACTAGAAGAAAAACTCCTTGTGGTGAAGGTTCTAAGACGTGGGATCGTTTCCAGAT 
GAGAATTCACAAGCGACTCATTGACTTGCACAGTCCTTCTGAGATTGTTAAGCAGA 

3 0 TTACTTCCATCAGTATTGAGCCAGGAGTTGAGGTGGAAGTCACCATTGCAGATGCT 

TAAGTCAACTATTTTAATAAATTGATGACCAGTTGTTAAAA 
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Sequence ID - 61 nt: 362 

CTTATTGAAAATTTTACTAATTTCTTACrrTTTAG 

TAATrGACTAGCCTCACATTATATTGATAGAGGTTCITGAAAACTTTAATGCCAAT 

TCATGTATCnTATGACTAAAATAGATAATCCATTTAGAAATTTAAGTCATTCTTGC 

GTGCTTGATATGTGTCAGCACTATCCAAGTTGCTAGGGGATACAATGGTGAAGTG 

AAAATATCAGCTAGGTGCCGGTGGCTCACACCTGTTATCCCAACAGTrrGGGAGG 

CCAGGGTGGGAGGATCACTCAAGCACANGCGrrrCACACCAGCCTGGACAACAT • 

ACAAGACCCCATCTTTACCAAAAGTTAAG 

Sequence ID -490 nt: 382 

TTTTCTTAGAACnnrTATTTTTTCT 

agcactttgggaggccaaggcaggtcgatcacctgaggtcaggagctcaagacc 

agcctggccaacatggtgaaaccctgtcrrctactaaaaatacaaaaattagctgg 

gcgtggtggcgcatgcctgtaatcccanctactcaggaggctgaggcaggagaa 

ttgtttgaacccgggaggcggaggttgcalsrrgagccgagattgcgccactgcact 

ccagcctgggcaacagagcgaaactccatctcaaaaaaaaaaaaaaaaaacaac 

GTTtattttttctgattttaaaagtaataactagtttgtagaaacattaaaagt 

Sequence ID - 892 nt: 559 

tctttcggaagcgcgccttgtgttggtacccgggaattcgcggccgcgtcgacgc 

ggtcgtaaggGctgaggatttttggtccgCacgctcctgctcctgactcaccgct 

gttcgctctcgccgaggaacaagtcggtcaggaagcccgcgcgcaacagccatg 

gcttttaaggataccggaaaaacacccgtggagccggaggtggcaattcaccga 

attcgaatcaccctaacaagccgcaacgtaaaatccttggaaaaggtgtgtgctg 

acttgataagaggcgcaaaagaaaagaatctcaaagtgaaaggaccagttcgaa 

tgcctaccaagactttgagaatcactacaagaaaaactccttgtggtgaaggttc 

taagacgtgggatcgtttccagatgagaattcacaagcgactcattgacttgcac 

agtccttctgagattgttaagcagattacttccatcagtattgagccaggagttg 

aggtggaagtcaccattgcagatgcttaagtcaactattttaataaattgatgac 

cagttgttt 



Sequence ID - 77 nt 464 

gcggctgctgttggttgggggccgtccx:gctcctaaggcaggaagatggtggccg 

caaagaagacgaaaaagtcgctggagtcgatcaactctaggctccaactcgttat 

gaaaagtgggaagtacgtcctggggtacaagcagactctg aagat gatcagaca 

aggcaaagcgaaattcm3tcattctcgctaacaactgcccagctttgaggaaatct 

gaaatagagtactatgctatgttggctaaaactggtgtccatcactacagtggca 

ataatattgaactgggcacagcatgcggaaaatactacagagtgtgcacactgg 

ctatcattgatccaggtgactctgacatcattagaagcatgccagaacagactgg 

tgaaaagtaaaccttttcacctacaaaatttcacctocaaaccttaaacctgcaa 

aattttcctttaataaaatttgcttg 



